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Abstract 


In this monograph, I place indices used to measure the uneven distribution dimen- 
sion of residential segregation in a new framework; I cast them as simple differ- 
ences of group means on individual-level residential outcomes scored from area 
racial composition. The “difference-of-group-means” framework places all popular 
indices in a common measurement framework in which index scores are additively 
determined by individual residential attainments. This yields new and appealing 
options regarding substantive interpretations of the scores of segregation indices. It 
also brings important methodological benefits by creating the new possibility of 
joining the investigation of aggregate segregation and the investigation of 
individual-level residential attainments together in a single analysis. Specifically, 
segregation index scores now can be equated with the effect of group membership 
(e.g., race) on individual residential attainments, and thus variation in segregation 
over time and across cities can be equated to the ways that the effect of group mem- 
bership varies over time and with city characteristics in multilevel models of 
individual residential attainments. Framing segregation indices in the difference-of- 
group-means framework has several other desirable consequences for segregation 
analysis. It creates opportunities to investigate segregation in new ways by permit- 
ting researchers to assess the impact of group membership on residential outcomes 
in the context of multivariate attainment models that if desired can include controls 
for other individual characteristics (e.g., language, education, income). Relatedly, it 
suggests a new basis on which to evaluate and compare segregation indices — 
whether the individual-level residential outcomes they register and reflect are rele- 
vant for theories of residential dynamics and/or are relevant for concerns about 
racial differences in socioeconomic attainments and life chances. Finally, the 
difference-of-group-means framework paves the way for developing refined ver- 
sions of indices that are free of potentially problematic upward bias intrinsic to 
standard formulations of these indices. Significantly, adopting the new framework 
outlined here does not require breaking with previous conceptions of segregation; 
results of empirical analyses of segregation using traditional computing formulas 
can be exactly replicated within this framework even as several new options for 
measurement and analysis become available. 
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In this monograph, I review findings and observations I have accumulated while 
grappling with issues in segregation measurement over the past decade. My explo- 
rations in this area were motivated by three concerns. The first was that, while it is 
obvious to all concerned that residential segregation can potentially have important 
consequences for group differences in residential outcomes, the literature on segre- 
gation measurement does not provide formulations of segregation indices that make 
it clear exactly what implications index scores have for group differences in residen- 
tial outcomes. In this regard, the measurement and analysis of segregation is on a 
different conceptual footing from standard approaches to measuring and analyzing 
intergroup disparity and inequality on other socioeconomic and stratification out- 
comes such as education, occupation, and income. Researchers investigating dis- 
parities in these other areas routinely assess inequality and disparities based on 
comparisons of group means on individual-level outcomes. Consequently, the con- 
nections between scores of measures of aggregate inequality have clear and direct 
implications for group differences in the attainments of individuals. In contrast, the 
literature on segregation measurement has not established how segregation index 
scores are connected to group differences on residential outcomes for individuals. 
This is surprising and unfortunate because the substantive relevance of segregation 
indices ultimately rests on the presumption that their scores carry important impli- 
cations for group differences on individual residential outcomes and yet these impli- 
cations have remained obscure. I address this concern here by introducing new 
formulations of popular segregation indices that place them in an overarching 
“difference-of-group-means” framework that clarifies exactly how segregation 
index scores are connected to group differences in individual-level residential 
outcomes. 

The second concern motivating me was that the literature on segregation mea- 
surement and analysis did not provide a straightforward means for directly linking 
quantitative findings from studies of micro-level processes of residential attainment 
to findings for segregation index scores at the aggregate level (e.g., city-level segre- 
gation scores). As a result, the research literature has been divided into two impor- 
tant but largely disconnected traditions. One is a tradition of macro-level studies 


Vii 


viii Preface 


that use aggregate-level index scores for cities to investigate how segregation varies 
across cities and over time; the other is a tradition of micro-level studies that exam- 
ine how various individual-level residential attainments are related to social and 
economic characteristics of individuals and households such as income, education, 
nativity, English language ability, family type, and other related individual-level 
variables. The current state of the literature leaves researchers in both traditions in 
the frustrating situation of being unable to directly connect segregation index scores 
at the aggregate-level to the individual-level outcomes that are examined and mod- 
eled in micro-level residential attainment analyses. I address this concern here by 
drawing on the difference-of-group-means measurement framework to develop 
methods for linking index scores to individual-level residential attainment pro- 
cesses. In this new approach, segregation index scores now can be interpreted as the 
effect of group membership (e.g., race) on segregation-determining residential out- 
comes in an individual-level attainment model. The level of segregation in a city 
thus can now be assessed by estimating the effect of group membership on 
individual-level residential attainment in bivariate attainment model. More impor- 
tantly, the model can be extended to a multivariate specification to properly take 
account of the role that nonracial characteristics (e.g., income) may play in shaping 
the level of segregation in a city. And the model can be further extended to multi- 
level specifications to take account of how city-level factors impact segregation net 
of the role of nonracial individual characteristics. Significantly, past findings of 
aggregate-level analyses can be exactly replicated and subsumed under this approach 
while giving researchers many new options for analysis. 

The third concern motivating me was that, under the current state of segregation 
measurement, many interesting and important research questions cannot be 
addressed because segregation index scores exhibit problematic behavior under a 
wide range of commonly occurring conditions. In particular, all indices of uneven 
distribution are subject to inherent positive bias that can render their scores untrust- 
worthy and potentially misleading in a variety of situations — for example, when 
segregation is measured at small spatial scales (e.g., at the block level) or when the 
groups involved in the segregation comparison are small and/or are imbalanced in 
size. This presents severe obstacles to many interesting and important lines of 
inquiry in segregation research. For example, it precludes quantitative study of seg- 
regation for newly arriving immigrant and migrant groups because, by definition, 
the groups initially are small in both absolute size and relative size in comparison to 
established population groups. Similarly, it precludes study of segregation among 
narrowly defined subgroups with a population (e.g., foreign- and native-born 
Latinos, high-income Whites and Blacks, etc.) because one or both subgroups often 
are small in absolute and/or relative size. Additionally, it potentially frustrates inves- 
tigation of segregation dynamics using agent simulation models because studies in 
this tradition routinely examine segregation at small spatial scales. 

The impact of these concerns on segregation research is substantial, pervasive, 
and hard to overstate. It has led researchers to routinely adopt two “defensive” prac- 
tices. One practice is to use various ad hoc guidelines to “screen” cases to avoid 
measuring segregation in situations where index scores cannot be trusted. The other 
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practice is to differentially weight cases to minimize the undesirable impact of bias 
on index scores even after cases have been “screened” to eliminate those where 
index scores are most problematic. The first practice prevents researchers from 
undertaking many studies that otherwise would be conducted and thus sharply 
restricts the scope of segregation studies. In addition, it draws on ad hoc guidelines 
that at best are crude and at worst have uncertain effectiveness. The second practice 
of differentially weighting cases is predicated on the implicit recognition that the 
first practice of screening cases cannot adequately deal with the problem of bias. 
Unfortunately, differential weighting of cases is itself inadequate. First and fore- 
most, it leaves index scores untrustworthy on a case-by-case basis and so one cannot 
discuss and compare cases — otherwise weighting would be unnecessary. Second, 
while the strategy permits researchers to avoid “draconian” screening of cases and 
thus larger nominal sample sizes, differential weighting in the end amounts to 
assessing segregation patterns and trends based on the small subset of cases that get 
large weights. 

I address this unsatisfying state of affairs by developing and introducing refined 
versions of popular segregation indices that provide trustworthy measurements of 
segregation over a much broader range of situations than standard measures. I dem- 
onstrate that the resulting unbiased measures have attractive properties and provide 
researchers the previously unavailable option of dealing with index bias directly at 
the point of measurement on a case-by-case basis. 

As I worked to address the three concerns just mentioned, I increasingly took 
interest in a fourth concern — the question of whether different segregation indices 
yielded similar or different results and, if different, under what conditions and why. 
Conventional wisdom in the segregation measurement literature has been that the 
most widely used measures of uneven distribution tend to give similar results. But I 
found discrepancies between indices were common when I measured segregation 
over broader samples of cases and group comparisons. At first I thought the large 
discrepancies between scores of different indices might be a by-product of the prob- 
lem of index bias. After all, using broader samples tends to include cases that are 
more susceptible to being adversely affected by the problem of bias, and previous 
methodological studies had reported that indices vary in susceptibility to scores 
being inflated by index bias. But on investigating the issue further, I found that the 
role of bias was only a minor part of the story as discrepancies between scores for 
different indices persisted even when using refined versions of the indices that were 
free of the influence of bias. 

The difference-of-group-means framework provided a useful perspective for 
exploring this issue and led me to recognize that the discrepant scores I observed 
reflected an aspect of uneven distribution that is not generally widely appreciated, 
namely, index sensitivity, or lack thereof, to whether displacement from even distri- 
bution is concentrated or dispersed. My goal in exploring this issue was different in 
nature from my goals in addressing the first three concerns I noted. In this case, I 
was not seeking to make progress toward solving technical problems in measuring 
and analyzing segregation. Instead, my goal was to clarify the nature of the differ- 
ences between indices to better account for why different indices sometimes yield 
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different results. In the end, I concluded the issue could be framed succinctly in 
terms of index sensitivity to whether group displacement from even distribution is 
concentrated and dispersed. At any given nontrivial level of group displacement 
from even distribution, groups can be concentrated in a way that produces homoge- 
neous areas for both groups, or groups can be dispersed in a way that minimizes 
homogeneous areas. Indices vary in their sensitivity to this aspect of uneven distri- 
bution. For example, the widely used index of dissimilarity (D) takes the same value 
regardless of whether displacement is concentrated or dispersed, while the separa- 
tion index (S) takes higher values when displacement is concentrated and takes low 
values when displacement is widely dispersed. 

I am hardly the first to recognize the technical basis for this potential difference 
between indices. But I believe my discussion and review of these issues makes use- 
ful new contributions to the literature on segregation measurement. First off, the 
analyses I report here document that important discrepancies between different 
index scores are much more common than previous methodological studies have 
suggested. Second, the difference-of-means framework for measuring segregation I 
introduce here provides a new basis for understanding exactly how different indices 
can yield discrepant index scores. Finally, I offer analytic exercises and empirical 
case studies to further clarify the basis of differences between indices and dispel 
certain misconceptions regarding of these issues. 

My hope is that this monograph will contribute to a better understanding of the 
issues examined here and also will provide useful practical strategies for measuring 
and analyzing segregation. Looking back on the decade of work reflected here, I can 
see with hindsight that the core issues are closely interconnected. Establishing how 
segregation index scores related to group differences in residential outcomes was a 
necessary step for developing methods for conducting micro-level analysis of 
individual-level residential attainments that could directly account for overall segre- 
gation in a city at the aggregate level. Discovering that the residential attainments in 
question were rooted in a simple construct — the pairwise group proportions in the 
area of residence — then paved the way for a further discovery, namely, that trouble- 
some problem of index bias could be eliminated by making surprisingly simple 
refinements in the calculation of pairwise group proportions. Thinking more care- 
fully about the individual-level residential outcomes that are registered by different 
indices led to a better understanding of the differences between concentrated and 
dispersed displacement from even distribution. 

The interconnections among the issues are clearer in hindsight. If I had recog- 
nized them from the start, I would have avoided muddling around for so long. I offer 
my findings and observations on these and related matters here in hopes that others 
will find them useful. I apologize in advance for the many limitations of this study 
but also suggest that it occasionally offers original insights and new options for 
segregation measurement and analysis that I hope can help other researchers move 
the study of segregation forward. 

Many organizations and many people have provided support and encouragement 
that helped make my work possible. Over the past decade, I was fortunate to receive 
funding support for projects that helped me develop findings and observations 
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included in this monograph. They included National Institutes of Health research 
grants R43HD038199 and R44HD038199 “Simulating Residential Segregation 
Dynamics: Phases I & II’; a proposal development grant from the Mexican American 
and Latino Research Center at Texas A&M University, College Station; and National 
Science Foundation research grant SES 1024390 “New Methods for Segregation 
Research.” Of course the funding agencies are not responsible for and do not neces- 
sarily endorse the findings and conclusions I offer. Finally, I also acknowledge a 
faculty development leave from Texas A&M University that was crucial for com- 
pleting the first full draft of the monograph. I also thank the College of Liberal Arts 
and the Open Access to Knowledge (OAK) program at Texas A&M University 
which have provided generous funding to help publish this monograph as an open 
access work. 

Ihave always received encouragement and support from my colleagues and good 
friends in the Sociology Department at Texas A&M University. In particular, I must 
mention Jane Sell, Dudley Poston, and Rogelio Saenz to offer special thanks for 
their support over this period of extended effort. I thank Wenquan “Charles” Zhang 
for engaging me in many stimulating and productive discussions on the issues 
addressed in this study and for collaborating with me as co-investigator on the 
aforementioned NSF project and associated empirical analyses that draw on the new 
measures and methods introduced in this work. Warner Henson III, currently a doc- 
toral student in the Sociology Department at Stanford University, deserves special 
acknowledgment for helping me establish the difference-of-means formulation of 
the Theil index while an undergraduate major in sociology here at Texas A&M 
University. I also must thank Amber Fox Crowell, at the time a doctoral student in 
sociology at Texas A&M University and now an assistant professor at California 
State University-Fresno, who served as research assistant on research projects in 
which the measures and methods introduced here were developed and refined. Her 
insights, questions, and suggestions have been extremely helpful. She has a deep 
grasp of the potential value the measures and methods can bring to empirical analy- 
ses and has applied them with great success in her dissertation project and her proj- 
ects as a postdoctoral researcher. I also offer a note of appreciation to the many 
students who have attended informal workshops on segregation measurement and 
analysis on a regular basis over the past several years including, in rough chrono- 
logical order, Warren Waren, Lindsay Howden, Amber Fox Crowell, Gabriel Amaro, 
Bianca Manago, Jennifer Davis, Jessica Barron, Nicole Jones, Bo Hee Yoon, 
Brittany Rico, Melissa Sanchez, Chiying Huang, Xuanren Wang, Katelyn Polk, 
Bridget Clark, Danielle Deng, Nathanael Rosenheim, Cassidy Castiglione, and 
Xinyuan Zou. Their questions, puzzlements, and suggestions helped me develop 
better ways of explaining and thinking about some of the material presented here. 
As ever throughout my career, I benefit from the voice of my dissertation supervi- 
sor, mentor, and friend, Omer Galle, which always present in the back of my mind 
encouraging me and challenging me. 

I also must acknowledge the amazing support Ms. Evelien Bakker and Ms. 
Bernadette Deelen-Mans with Springer have given during the process of bringing 
this monograph to completion. Apparently, it is impossible to exhaust their patience 
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and goodwill. They have been encouraging, helpful, and pleasant at every step in the 
process. For that I am truly grateful. 

I close by thanking my wife, Betsy, and our children Kate, Tyler, and Lane for 
their support and patience. They have brought, and continue to bring, great joy to 
my life. They have kept me grounded through the ups and downs of the too-long 
gestation period for this monograph. I love them more than they know and appreci- 
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Chapter 1 
Introduction and Goals 


The literature on residential segregation is one of the oldest empirical research tra- 
ditions in sociology and has long been a core topic in the study of social stratifica- 
tion and inequality as well as in the study of the demography of spatial population 
distribution. This literature is guided by the fundamental assumption that group 
differences in neighborhood residential outcomes are closely associated with social 
position and life chances. Accordingly, indices measuring segregation, especially 
the dimension of uneven distribution, are viewed as important summary indicators 
of overall group standing and scores for segregation indices have been a mainstay 
of research documenting levels, patterns, and trends in the residential segregation 
of racial and ethnic groups. Given the extensive attention social scientists have 
directed to the study of residential segregation, one might assume that the relation- 
ship between residential segregation and group differences in neighborhood resi- 
dential outcomes is well understood. Surprisingly, this is not the case. The issue has 
received little attention in the literature on segregation measurement. Consequently, 
researchers are not able to offer precise conclusions about group differences in resi- 
dential outcomes based on scores for popular and widely used indices of uneven 
distribution. 

In this monograph I address this deficiency in the literature by outlining a new 
approach to measuring uneven distribution. My goal is not to replace familiar, 
widely-used indices with new ones. Instead, I wish to place popular indices in a new 
alternative framework that clarifies the implications they carry for group differences 
in individual-level residential outcomes. My motivation for doing this rests on two 
convictions. One is that understanding how segregation is related to individual resi- 
dential outcomes is desirable for its own sake and brings valuable new options for 
interpreting segregation index scores and understanding differences between them. 
The other is that casting segregation indices in terms of group differences in indi- 
vidual residential outcomes brings benefits for segregation measurement and analy- 
sis including, as two primary examples, the ability to directly link segregation at the 
aggregate or macro level to micro-level processes of residential attainment and the 
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ability to develop versions of the indices that are free of the troublesome problem of 
inherent upward bias. 

Moving from generalities to specifics, my goal in this monograph is to set forth 
the “difference of means” framework, a new framework for segregation measure- 
ment wherein popular indices of uneven distribution are cast as simple differences 
of group means on residential outcomes that register group contact and exposure 
based on area racial composition. In accomplishing this goal I establish that all 
widely used segregation indices including the Gini Index (G), the Delta or 
Dissimilarity Index (D), the Hutchens Square Root Index (R) — an index with close 
similarities to the Atkinson Index (A), the Theil Entropy Index (H), and the 
Separation Index (S) — also known as the variance ratio and a variety of other names, 
can be expressed as a difference of group means on individual- or household-level 
residential outcomes (y) that are scored on the basis of index-specific scaling of 
group contact based on area group proportions. 

The indices just listed are all well-known and all have been reviewed in detail in 
many previous methodological studies (e.g., Duncan and Duncan 1955; Zoloth 
1976; James and Taeuber 1985; Stearns and Logan 1986; White 1986; Massey and 
Denton 1988; Hutchens 2001, 2004; Reardon and Firebaugh 2002). The contribu- 
tion I seek to make is to clarify a characteristic of these indices that currently is not 
well understood; namely, the particular way each one relates to and ultimately quan- 
titatively registers group differences in neighborhood residential outcomes. The 
sociological relevance of segregation index scores rests on the presumption that 
they carry important implications for group differences in social position and life 
chances that are associated with area of residence. Segregation researchers and con- 
sumers of segregation research thus generally assume that variation in segregation 
index scores tends to correlate with variation in a broad range of group disparities 
associated with neighborhood residential outcomes. 

It is definitely plausible to assume that summary index scores may serve as prox- 
ies for valuable, but usually unavailable information about residentially-based group 
inequality and disparity. But it is important to recognize that, in the final analysis, 
the calculations that yield segregation index scores revolve around a simple and 
very particular aspect of neighborhood residential outcomes — “pairwise” group 
proportions.! This residential outcome can be understood in multiple ways from the 
point of view of individuals and households. For example, it can be understood as 
registering levels of contact or exposure based on co-residence with members of the 
two groups in the comparison. Alternatively, it can be understood as registering 
exposure to deviations or departures from the racial composition of the city as a 
whole. One of my goals is to clarify how different indices register group differences 
on individual residential outcomes relating to area racial mix and group proportions. 


'The expression “pairwise” ethnic mix or group proportion signifies that the calculations involved 
use only the population counts for the two groups in the comparison. It can be contrasted with 
“overall” ethnic mix or group proportion based on the full population including groups not in the 
segregation comparison. Significantly, the presence and distributions of groups other than the two 
in the comparison has no bearing on scores for indices of uneven distribution. 
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In doing so I hope to help researchers better understand what indices specifically 
measure in this regard when they are interpreting index scores and evaluating their 
relevance as proxies for group position. 

Indices of uneven distribution provide quantitative summaries of how groups are 
differentially are distributed across neighborhoods that vary on “pairwise” racial 
mix. This obviously has direct implications for individual residential outcomes 
relating to racial mix and indices can be cast in two ways that reflect this fact. One 
option is to cast indices as simple, overall population averages on individual resi- 
dential outcomes scored on the basis of area racial mix. I review this option briefly, 
but I give it limited attention because it not especially novel and it is not useful for 
my main goals. The second option is to cast indices of uneven distribution as group 
differences of means on segregation-relevant neighborhood residential outcomes 
scored from pairwise racial mix. This approach is the primary focus of my attention 
because it resonates with substantive interests that motivate much of the research on 
segregation — namely, concerns about group disadvantage and inequality rooted in 
differential residential distribution. Additionally, the difference of means approach 
brings several practical advantages for segregation measurement and analysis. 

I offer the difference of means framework for computing indices of uneven dis- 
tribution in hopes that it will be a useful alternative to prevailing approaches to 
computing index scores. However, I stress from the outset that I intend this new 
framework to be an enhancement of and supplement to traditional approaches to 
segregation measurement, not a wholesale replacement. The difference of means 
framework does not yield different values for index scores. Instead, it yields identi- 
cal index scores but draws on new, mathematically equivalent index formulations to 
gain new understandings of segregation and new options for measurement, interpre- 
tation, and analysis. In current practice indices of uneven distribution are formu- 
lated and interpreted in ways that focus attention on aggregate-level patterns for 
spatial units (i.e., areas or neighborhoods). The formulas used generally feature 
calculations that register the extent to which the racial mix of areas (neighborhoods) 
within a city depart from the racial composition of the city as a whole. These widely 
used formulas are tried and true and they are useful and convenient for many pur- 
poses. That said, it also is important to recognize what the most widely used com- 
puting formulas neglect and obscure. Traditional approaches to measuring uneven 
distribution do not clarify the how segregation is connected to group differences in 
neighborhood residential outcomes for individuals. It is obvious that neighborhood 
departures from city racial composition necessarily carry implications for group 
differences in residential outcomes. But the specific nature of these implications is 
not well understood because it is not revealed in prevailing approaches to formulat- 
ing, computing, and interpreting segregation indices. 

The “difference of means” framework for calculating and interpreting popular 
segregation indices I introduce here addresses this gap in the literature on segrega- 
tion measurement. The framework highlights something that currently is not widely 
appreciated — that differences between indices can be understood as arising from a 
single factor, the particular way each index registers segregation-relevant residential 
outcomes for individuals as scored from area racial composition. On reflection this 
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probably should not be surprising. All indices are calculated from the same underly- 
ing distribution of residential outcomes on pairwise racial proportions. Consequently, 
index scores obtained from group differences of means on residential outcomes can 
differ only by registering these very specific residential outcomes in different ways. 
These cross-index differences in “scoring” area racial mix provide a new basis for 
comparing and evaluating indices of uneven distribution. 

The difference of means formulation of indices of uneven distribution brings 
additional practical benefits beyond clarifying how index scores are related to group 
differences in residential outcomes. One example is that the approach makes it pos- 
sible to join the study of aggregate segregation with the study of individual-level 
residential attainment in a seamless way. This becomes possible because segrega- 
tion index scores now can be viewed as arising from the simple additive aggregation 
of segregation-relevant, neighborhood residential outcomes for individuals. As a 
result, segregation index scores can be equated with the effect of race in micro-level 
regression models predicting the residential attainments of individuals and house- 
holds that additively determine segregation at the aggregate-level.? These micro- 
level attainment models can be extended to include multiple individual and 
household characteristics as predictors in the attainment equation. This then enables 
researchers to assess segregation — now equated to the effect of race on residential 
attainments — in multivariate specifications that control for non-racial factors (e.g., 
income, nativity, language ability, etc.) that also may affect the residential attain- 
ments that ultimately determine segregation. The new ability to model the individual- 
level residential attainments that directly and additively give rise to segregation 
makes it possible to undertake quantitative standardization and decomposition anal- 
yses to assess the extent to which group differences on factors other than race con- 
tribute to overall segregation based on their impact on residential outcomes that 
determine aggregate segregation. Finally, city-specific, individual-level models of 
residential attainments can be extended to multi-level specifications that can be used 
to investigate variation in segregation over time and across cities in new ways that 
previously were not feasible. 

The kinds of analysis options just described have been available and used on a 
routine basis for decades in the broader literature investigating racial differences in 
most domains of socioeconomic attainment (e.g., education, income, occupation, 
etc.). Until now, however, they have been not been available in segregation research. 
The reason for this is that segregation, in contrast to racial inequality in other socio- 
economic attainments such as education, occupation, and income, has not been 
explicitly formulated in terms of group differences on individual attainments. 
Placing indices of uneven distribution in the difference of means framework thus 
puts segregation analysis on similar conceptual footing with research traditions that 
analyze other aspects of racial socioeconomic disparity and inequality. 


? Specifically, the segregation index score is equal to the value of the unstandardized regression 
coefficient for race (coded as 0 or 1) in an individual-level regression predicting residential 
outcomes. 
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The difference of means formulation of indices of uneven distribution brings 
other benefits as well. One conceptual benefit is to introduce a new basis for evaluat- 
ing and choosing among familiar indices; namely, whether and to what degree the 
individual-level residential outcomes registered by a given index are relevant for 
theories of segregation dynamics and racial socioeconomic stratification. Another 
practical benefit is that the approach makes it easy to implement spatial versions of 
popular segregation indices. 

Last but not least, the difference of means formulation of segregation indices 
provides a basis for gaining a better understanding the source of index bias — a well- 
known and vexing problem that can make scores of standard versions of indices of 
uneven distribution untrustworthy and potentially misleading. This new understand- 
ing then makes it possible to develop unbiased versions of popular indices based on 
implementing surprisingly simple refinements to index formulas that eliminate this 
problematic behavior of index scores. 

In the chapters that follow I introduce the difference of means formulations of 
widely used segregation indices and provide more detailed reviews of the new 
options for measurement, interpretation, and analysis just mentioned. In Chaps. 2, 
3, 4, and 5 I introduce the difference of means framework and explore differences 
between indices as revealed through the lens of this framework. I begin in Chap. 2 
by noting that scores of popular indices of uneven distribution can be obtained using 
a variety of mathematically equivalent formulas and I briefly review selected formu- 
las to highlight how they support different insights about segregation measurement. 
I conclude the chapter by introducing the difference of means formulas that are used 
throughout this monograph. In Chap. 3 I provide a general overview of the differ- 
ence of means framework. I then expand on this in Chap. 4 by offering a more 
detailed discussion of how individual measures of uneven distribution can be cast as 
difference of group means on residential outcomes scored from area racial propor- 
tions. In Chap. 5 I note a useful insight about uneven distribution that emerges from 
the difference of means framework; namely, that differences between indices can be 
seen as arising from a single source — how each index registers individual residential 
outcomes scored from area group proportions. 

In Chaps. 6, 7, and 8 I review the logical and empirical differences among popu- 
lar measures of uneven distribution and offer suggestions regarding how to under- 
stand and interpret these differences. In Chap. 6 I document that, in contrast to 
findings reported in some previous methodological studies, popular indices of even 
distribution can and often do yield highly discrepant scores. The analyses I present 
here establish that the findings of earlier methodological studies — which reported 
that popular indices tended to be highly correlated in empirical application — are a 
byproduct of focusing primarily on White-Minority segregation in a small subset of 
large metropolitan areas where the minority group is a substantial presence in terms 
of relative group size and where group residential distributions are characterized by 
a particular pattern of “prototypical” segregation. This is a pattern of uneven distri- 
bution in which group displacement from parity involves a high level of group sepa- 
ration and area racial polarization because both groups are disproportionately 
concentrated in homogeneous areas. I refer to uneven distribution with this pattern 
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of “concentrated displacement” as “prototypical segregation” because this signature 
pattern — in which all popular measures of uneven distribution take high scores — is 
always present in crafted examples used to illustrate high segregation in didactic 
discussions of segregation measurement. Similarly, it also is invariably present in 
empirical cases used to illustrate high levels of segregation. So it easy to understand 
that many would not be aware that popular indices can take substantially discrepant 
scores. 

The empirical analyses I present in Chap. 6 document that uneven distribution 
does not always take the form of prototypical segregation. To the contrary, the anal- 
yses instead reveal that broader samples of cities include a large number of cases 
with a sharply contrasting pattern of “dispersed displacement” wherein uneven dis- 
tribution involves extensive group displacement from parity but does not involve 
group separation and area racial polarization. In these situations, index scores can 
be highly discrepant. Specifically, indices that are sensitive to differential displace- 
ment — such as the gini index (G) and the dissimilarity index (D) which Duncan and 
Duncan (1955) aptly also termed the displacement index — will take high scores 
while the Theil index (H) and the separation index (S) — which Stearns and Logan 
(1986) note is sensitive to residential separation and area racial polarization — will 
take low scores. 

In Chap. 7 I review the distinction between concentrated and dispersed displace- 
ment in more detail. The chapter makes two important points. One is that the socio- 
logical implications of uneven distribution involving “prototypical segregation” and 
D-S concordance are fundamentally different from the sociological implications of 
uneven distribution with dispersed displacement and substantial D-S divergence. 
Simply put, a high level of group separation is obviously substantively compelling 
and necessarily entails a high level of displacement. But the reverse is not true. 
Thus, high levels of displacement do not always entail high levels of group separa- 
tion and this should be noted when it occurs because the literature on segregation 
measurement provides no clear basis for viewing differential displacement without 
group separation as sociologically important. The second point I make in this chap- 
ter is that the largely unrecognized but empirically common outcome of dispersed 
displacement is not an artifact of relative group size or deficiencies in indices that 
are more sensitive to group separation than displacement. To make this point I intro- 
duce and exercise simple analytic models to show that when non-trivial displace- 
ment from even distribution is present, it can be concentrated or it can be dispersed. 
Concentrated displacement produces “prototypical segregation” wherein the score 
of S will approach or even equal the score for D indicating that displacement 
involves group separation and area polarization. In the case of dispersed displace- 
ment, D will be equally high but S will be low signaling that group separation and 
area polarization are minimal, sometimes to the point of being negligible. I review 
the principles of transfers and exchanges from segregation measurement theory to 
establish that D-S discrepancies of this sort arise because D is flawed and does not 
conform to these accepted principles of segregation measurement. 

Chapter 8 supplements the analytic results by discussing the sociological dynam- 
ics that are likely to influence whether non-trivial displacement takes the form of 
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“prototypical segregation” or the substantively less compelling pattern of dispersed 
displacement. It also reviews case studies of empirical examples of high-D-high S 
combinations that in communities where the minority group is small in relative size. 
The discussion here drives home two important points. One is that scores for D and 
S can be congruent or discrepant in any setting where displacement from uneven 
distribution is non-trivial. The other is that sociological dynamics, not artifacts of 
index construction, determine whether in fact D and S are congruent or discrepant 
in a given community. 

In Chap. 9 I show how the difference of means framework creates new options 
for research by joining micro- and macro-level analysis of segregation. At the sim- 
plest level, casting segregation as a difference of group means on residential out- 
comes leads to the new insight that segregation index scores are exactly 
mathematically equivalent to the effect of race in bivariate regression analyses pre- 
dicting segregation-determining residential outcomes for individuals. I then argue 
that this insight opens the door to the new possibility of using multivariate regres- 
sion analyses to quantitatively assess how segregation arises from two sources. The 
first source is group differences on distributions of social and economic characteris- 
tics that are salient in residential attainment processes. The second source is group 
differences in the efficacy of how inputs to residential attainment processes translate 
into segregation-determining residential outcomes. In this framework, segregation 
can be analyzed in greater detail and sophistication by using standardization and 
decomposition analysis in combination with multivariate regression analysis of 
attainments, methods that are routinely applied to the study of racial inequality in 
education, occupation, income, health, and other important stratification outcomes. 
This is a major advance as research on segregation has lagged behind research on 
group disparities in other domains where aggregate-level outcomes on group dis- 
parities have long been routinely analyzed as outgrowths of micro-level attainment 
processes. 

In Chap. 10 I show how the regression analysis of individual-level residential 
attainments can subsume comparative analysis of cross-city variation in segregation 
and create new possibilities for investigating the factors contributing to variation in 
segregation across cities and over time. The new approach involves extending city- 
specific analysis of segregation using bivariate and multivariate models of individual- 
level residential attainment to multi-level specifications that reveal how the process 
determining segregation varies across cities and over time. I first note that findings 
from aggregate-level analyses of cross-city variation in segregation can be exactly 
reproduced using multi-level specifications of segregation-attainment models. I 
then outline how this specification opens the door for improving the ability of 
researchers take accurately assess the role that non-racial characteristics such as 
income may play in shaping cross-city variation in segregation. 

Previous research has often tried to assess the impact of group differences on 
income and other individual-level characteristics by the method of aggregate-level 
regression analysis. I note that this approach is prone to yield flawed results because 
it runs afoul of the “aggregate fallacy.” The problem is hidden from view and less 
obvious when segregation is viewed only as a macro-level outcome. It becomes 
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clear and more readily evident when segregation is analyzed within the difference 
of means framework where the outcome of segregation at the aggregate-level is 
exactly determined by individual-level attainment processes. I demonstrate the 
importance of the problem by showing that results of analyses that assess the impact 
of group income differences on segregation using aggregate-level regressions are 
contradicted by multi-level regression analyses that avoid the aggregate fallacy and 
properly take account of the effects of income at the individual-level. 

In Chaps. 11, 12, and 13 I review topics that benefit from insights and perspec- 
tives gained from drawing on the difference of means framework for analyzing seg- 
regation. In Chap. 11 I note that the difference of means framework makes it easy 
for researchers to implement spatial versions of popular segregation indices if they 
desire to do so. The reason for this is simple; the residential attainments for indi- 
viduals that determine segregation scores can be computed using mutually-exclusive 
bounded areas, or using overlapping, spatially-defined areas. The former yields a 
traditional aspatial index score. The latter yields a “spatial” index score that is 
affected by how neighborhoods that vary in racial composition are distributed in 
space. 

In Chaps. 12 and 13 I argue that the difference of means framework leads to new 
perspectives regarding what aspects of residential segregation researchers will view 
as most compelling on substantive grounds. In Chap. 12 I argue that group separa- 
tion is a more compelling substantive concern than mere displacement from even 
distribution. I frame the issue as follows. It is non-controversial to assert that group 
separation area racial polarization is substantively important because it is a logical 
prerequisite for group disparity and inequality on neighborhood-based residential 
outcomes. In contrast, there is no established basis for arguing that displacement 
from even distribution is substantively important when it does not involve group 
separation and area racial polarization. The only candidate is the “volume of move- 
ment” interpretation of D in which a high value of D does indicate that a large frac- 
tion of one group must move to bring about exact even distribution. But it is rendered 
irrelevant in situations where movement to exact even distribution has no impact on 
group separation and area racial polarization. 

In Chap. 13 I consider how being sensitive to different aspects of uneven distri- 
bution makes different indices more or less relevant for theories of segregation. I 
note that measures rooted in the segregation curve — G and D ~ are sensitive to rank- 
order differences on the residential outcome of area racial proportion but are rela- 
tively insensitive to the quantitative magnitude of the differences involved. In 
contrast, the separation index is sensitive to the quantitative magnitude of the differ- 
ences because it registers the residential outcome of area racial composition in its 
natural metric. Segregation dynamics such as “tipping,” resulting from group dif- 
ferentials in entries and exits to areas, and discrimination to exclude groups from 
areas are thought to be triggered by area group proportions. In contrast, theories of 
segregation dynamics rarely direct attention to rank order position on area racial 
composition over and above its association with area racial composition itself. 

In Chaps. 14, 15, and 16 I give attention to the problem of index bias. All indices 
of uneven distribution have the undesirable property that their scores are subject to 
inherent upward bias that can be non-negligible and varies in magnitude across 


1 Introduction and Goals 9 


individual cases. I draw on the difference of means framework to first identify the 
source of index bias and then identify a solution for obtaining unbiased versions of 
all popular indices of uneven distribution. I use formal analytic models and empiri- 
cal exercises to demonstrate that the unbiased versions of G, D, R, H, and S behave 
as desired in analytic exercises and in empirical applications. Significantly, the dif- 
ference of means framework is crucial because it provides the vantage point needed 
to identify both the source of the problem and its solution both of which turn out to 
be surprisingly simple and intuitive. In this new formulation index scores are calcu- 
lated as differences of group means on individual residential outcomes scored from 
area racial proportion. The source of bias can be traced to how area racial proportion 
is assessed from the perspective of individuals. In the “standard” (biased) formula- 
tion, the individual in question is included in the area counts used to calculate area 
racial proportion. The value of this residential outcome for the individual thus 
reflects a combination of two things: the individual’s own contribution to area racial 
mix and the racial mix of neighbors — the other individuals in the area. Under ran- 
dom assignment the racial mix of neighbors is a random draw and every individual, 
regardless of group membership has the same expected distribution of outcomes on 
racial mix of neighbors. In contrast, an individual’s own contribution to area racial 
mix is fixed and, importantly, differs systematically with group membership. This is 
the source of index bias. Once seen from this vantage point, the problem of index 
bias can then be eliminated by assessing area racial proportion for individuals based 
on neighbors instead of area population. 

The solution to index bias I offer in this monograph is attractive for many rea- 
sons. To begin, when working within the difference of means framework for calcu- 
lating indices of uneven distribution, the solution is simple and intuitive, even 
“obvious.” Second, the unbiased measures do not require radical changes in research 
practices. Researchers can continue to use the same measures they have used for 
decades. But now they can use refined versions of these measures that will yield 
scores that are free of bias at the level of individual cases in situations where previ- 
ously researchers could not trust index scores and in other situations the scores of 
the refined versions will be essentially identical to scores obtained using standard 
computing formulas. In sum, the new versions will exactly replicate research find- 
ings obtained using standard index versions when measurement is non-problematic 
and will yield superior results when standard calculations cannot be trusted. 

In Chap. 17 I offer final comments on the contributions of the monograph overall 
and reiterate my hope is that the new options for measurement and analysis I intro- 
duce here will enable researchers to investigate residential segregation in more 
detail and depth than has previously been possible. Significantly, the benefits gained 
from using the new options of measurement and analysis are “cost free”; there are 
no penalties or sacrifices associated with adopting them. Researchers do not have to 
put aside familiar measures and replace them with unfamiliar ones. The difference 
of means framework for measuring segregation permits researchers to exactly rep- 
licate results of past studies while at the same time giving them new options for 
refined measurement, expanded analysis, and attractive substantive interpretations. 
Thus, researchers can maintain continuity with previous studies of aggregate 
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segregation while simultaneously having the option of taking advantage of opportu- 
nities to analyze segregation in new ways to gain a deeper, more detailed under- 
standing of segregation patterns. 
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Chapter 2 
Alternative Formulas for Selected Indices 


The values of all popular indices of uneven distribution can be obtained using a 
variety of mathematically equivalent computing formulas. For a given index some 
formulas are more familiar and widely used than others, but no single formula can 
be declared sacred or best for all purposes. The many alternatives can be confusing 
to those who are new to segregation research. But their availability benefits research- 
ers by providing a variety of options from which to choose to best serve the needs 
of a particular study. The relevant considerations can include factors such as effi- 
ciency of computation, ease of explaining the index to broad audiences, relevance 
for establishing appealing substantive interpretations, capacity for enabling practi- 
cal tasks such as decomposition analysis or the calculation of spatial versions of 
index scores, and utility for pinpointing technical issues in segregation measure- 
ment. Researchers may choose a particular formula specifically to serve the needs 
of a given study. Or they may use a formula based on familiarity and habit. But in 
one crucial sense the choice is unimportant as all valid formulas can be used inter- 
changeably without affecting the results of individual index scores, research find- 
ings, and substantive conclusions. 

To specialists well-versed in the literature on segregation measurement these are 
not surprising observations. Nevertheless, I raise the point because many research- 
ers and most consumers of segregation research understand the quantitative under- 
pinnings of segregation index scores based primarily on a handful of popular 
computing formulas. This is not a problem in itself. But problems can arise when 
lack of familiarity with mathematically equivalent alternatives makes individuals 
resistant to insights and interpretations that can be gained by drawing on alternative 
formulations of a particular index. This leads me to suggest that, while some formu- 
las for popular indices of uneven distribution are better known and more widely 
used, it can be useful to consider other, less well known alternatives. In this chapter 
I discuss three classes of formulas. The formulas in the first group, which includes 
some well-known formulas that are very widely used in empirical research, focus 
attention on outcomes for areas and provide little insight into the relationship 
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G = 100: (ÈXi1Yi — È XiYi-1) (Duncan and Duncan 1955) 
D = 100-% |(n1/N1) — (n2i/N2) | (Duncan and Duncan 1955) 
R= 


100-(1.0 — È y (n4i/N1)' (nzi/N2) ) (Hutchens 2001:23) 


Fig. 2.1 Examples of selected area-based computing formulas for indices of uneven distribution 
(Notes: N, and N, denote city-wide population counts for the two groups in the comparison; T = 
N; + No; i denotes area; nı; and nz; denote the area counts for the two groups in the segregation 
comparison; and X; and Y; denote the cumulative proportions of groups | and 2, respectively, over 
areas ranked from low to high on p; obtained from n,,/(n,;+n,,). A summary of notation used is 
given in Appendices) 


between residential segregation and residential outcomes for individuals. The for- 
mulas in the second group establish that indices of uneven distribution are con- 
nected to the residential outcomes of individuals, but they not provide a basis for 
gaining insight into how residential outcomes differ across groups. The formulas in 
the third group go one step further and establish that indices of uneven distribution 
can be cast in ways that reveal how segregation is specifically connected to group 
differences on individual-levels residential outcomes associated with neighborhood 
racial composition. 

Many, perhaps most, readers will have given little thought to how indices of 
uneven distribution are linked to individual residential outcomes. This would not be 
surprising as this aspect of indices of uneven distribution has not been emphasized 
in the literature on segregation measurement. It also is not obvious from inspecting 
the most widely used computing formulas for popular indices. Alternative formulas 
that do highlight the property tend not to be well known in addition to being infre- 
quently used. In view of this, I use this chapter to briefly introduce formulas that 
highlight individual residential outcomes and contrast them with standard comput- 
ing formulas. To streamline presentation, I offer minimal commentary here on the 
derivations of the new formulas that are introduced in this chapter. For those who 
are interested, I provide derivations and more detailed discussion of related techni- 
cal issues as Appendices. In Chaps. 3, 4, and 5 in the body of the monograph I 
provide general discussions of the new formulas introduced here and then review 
their benefits for segregation measurement and analysis throughout the remainder 
of the study. 

I begin by introducing computing formulas for three indices of uneven distribu- 
tion that have very close relations to the segregation curve; namely, the gini index 
(G), the dissimilarity or delta index (D), and the Hutchens square root index (R). 
The formulas are given in Fig. 2.1. The formulas for G and D are likely to be famil- 
iar to many readers as they are widely used in segregation studies. In no small part 
this is because these formulas were introduced in Duncan and Duncan (1955), a 
landmark methodological study that served as the definitive guide to segregation 
measurement for three decades. In addition, they have continued to remain popular 
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because they are convenient computing formulas that are relatively easy to 
implement in empirical analyses. The formula for R was introduced more recently 
(Hutchens 2001) but I include it with the formulas for D and G because all three 
measures have close relations to the segregation curve and, as I document later in 
Chap. 6, all three are highly correlated in empirical applications. G and D are better 
known to sociologists. But R has technical properties that make it an attractive index 
to consider if one is committed to using a measure with close relations to the segre- 
gation curve. 

The point I make about these three formulas is that they focus attention on out- 
comes for areas, not outcomes for individuals. The formulas adopt this orientation 
in part because it is efficient for computing index scores from area tabulations — a 
fact of non-trivial practical import in the early era of segregation research when 
Duncan and Duncan’s study first appeared. In addition, these formulas fit comfort- 
ably with approaches to thinking about segregation that have an aggregate-level 
focus and frame the assessment of even distribution from the point of view of 
whether or not the racial composition of areas or neighborhoods matches the racial 
composition of the city as a whole. I note, however, that something important is left 
mysterious and obscure in these formulas. It is the residential outcomes that the 
individuals residing in these areas experience and how these outcomes may or may 
not vary systematically for the two groups in the segregation comparison. 

The formulas for G and D given here are probably the two most widely applied 
computing formulas for measuring residential segregation. They also are likely to 
be the first two computing formulas students of segregation research learn. The fact 
that these formulas provide little to no basis for drawing insights about how segre- 
gation is connected to residential outcomes for individuals speaks volumes about 
the state of the literature on segregation measurement. 

Figure 2.2 provides alternative formulas for G, D, and R and adds in similar 
formulas for two additional indexes, the Theil entropy index (H) and the separation 
index (S) (also known as eta squared [n°] and the variance ratio). With the exception 
of the formula for R, these computing formulas also are likely to be familiar to many 
readers because they have been featured in many important methodological studies 
(e.g., Duncan and Duncan 1955; Zoloth 1976; James and Taeuber 1985; White 
1986; Massey and Denton 1988). They, or close variations on them, are widely used 
in segregation studies. In no small part this is because they are convenient comput- 
ing formulas that are relatively easy to implement in empirical analyses. 

The formulas Fig. 2.2 have a key feature in common. Each formula incorporates 
the term “t;” in the core calculations leading to the index value. This term represents 
the combined population of the two groups in the comparison residing in the i’th 
area in the city. The calculations involving this term are cumulated over all areas 
and at some point are divided by “T,” the combined city-wide total populations of 
the two groups. Based on this construction, the index score can be understood as an 
average value for a quantitative result assessed for all individuals in the segregation 
comparison. 

The point I want to make about these formulas is that the quantitative result com- 
puted for individuals can be viewed as an individual-level residential outcome or 
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G = 100-(1/2T?PQ)-=tit;|pi-p,| (James and Taeuber 1985:5) 
D = 100-(1/2TPQ)-<ti|pi-P| (James and Taeuber 1985:6) 


R= 100 [1 -(1/T)-X tr/piqi/PQ | (Appendix F, this monograph) 


H = 100->t;[(E—E\)/ET] (Massey and Denton 1988:285) 


S = 100-(1.0 — [(trpiqi)/TPQ]) (Zoloth 1976:282) or 
100 -(1/TPQ):= ti(pi — P)? (James and Taeuber 1985:6) 


Fig. 2.2 Examples of area-based computing formulas for indices of uneven distribution that 
implicitly feature overall averages on individual-level residential outcomes (Notes: N; and N, 
denote city-wide population counts for the two groups in the comparison; T = N; + N3; P = N,/T; 
Q=N,/T; i denotes area; n, and n denote the area counts for the two groups in the segregation 
comparison; t = n; + Ny; pi = nj;/t;; qi = no/t;; X; and Y; denote the cumulative proportions of groups 
1 and 2, respectively, over areas ranked from low to high on p;; and E denotes entropy for the city 
overall given by E = P-Log,(1/P) + Q-Log,(1/Q) and E; denotes entropy for area i given by E; = 
pi'Log,(1/p;) + q;-Log>(1/q;). A summary of notation is given in the Appendices) 


residential attainment. I emphasize this point with the formulas listed in Fig. 2.3. 
These are alternative, mathematically equivalent versions of the formulas given in 
Fig. 2.2. The only difference is that the formulas have been rearranged to highlight 
and clarify how each index can be understood as an overall average of residential 
outcome scores (y) for individuals. A more detailed discussion of these formulas are 
given in the Appendices. Here I limit my comments to noting that the residential 
outcome terms (y) can be characterized as registering the degree to which the racial 
composition in the area the individual resides in departs from the racial composition 
of the city. In the case of G, D, H, and the first formula for S, the calculation of the 
departure score involves a city-specific constant that “scales” results so the final 
index score will fall in the range 0-1. 

These formulations show that, if one chooses to do so, all popular measures of 
uneven distribution can be expressed in terms of individual residential outcomes. 
While this option has been available for most measures for many decades, mathe- 
matical expressions of this form have not been as widely used and discussed as the 
standard computing formulas. One reason for this is that formulating indices of 
uneven distribution as overall population averages on residential outcomes does not 
provide any significant practical advantages. Another reason is that these formula- 
tions do not support substantive interpretations that are viewed as useful and com- 
pelling for the study of segregation. Most studies that measure uneven distribution 
are motivated by the assumption that it ultimately carries important implications for 
group differences in residential distributions and residential outcomes. Casting 
uneven distribution as an overall average for residential outcomes, while a viable 
mathematical option, does not speak directly to a substantive interest focused on 
group differences in residential distributions and residential outcomes. Nevertheless, 
these formulations are relevant for my purposes because they make it clear that all 
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Averaging Scores for y Scores Assigned to Individuals 
Index Over All Individuals Based on Scaling Function yx = f(pi) 
G= 100: (1/T) Eyk Yk = E | Pu- Pm | /2TPQ 
D = 100:(1/T)-Zyx Yr = |pi—P|/2PQ 
R= 100-[1-(1/T)-Zyx] yx = Vpiqı/PQ 
H = 100-(1/T)-Zyx yk = (E—E;)/E 
S= 100:(1/T):Zyx Yk = (pi—P)?/PQ or, alternatively, 

100: [1-(1/T)-2yx] Yk = pigi/PQ 


Fig. 2.3 Formulas explicitly casting values of indices of uneven distribution as overall population 
averages on individual residential outcomes (y) (Notes: k and m index individual households; p; 
denotes the pair-wise area proportion for the reference group in the i’th area; p, denotes the value 
of p; for the k’th household and p,, denotes the value of p; for the m’th individual; See notes to 
Figs. 2.1 and 2.2 for other terms) 


indices of uneven distribution have definite relations to residential outcomes for 
individuals. 

Thinking about this led me to raise two questions that are central to this study. 
They are “Can indices of uneven distribution be formulated in a way that provides 
direct insights regarding group differences in residential outcomes?” and, if so, 
“How specifically do indices of uneven distribution register group differences on 
neighborhood residential outcomes?” The formulas presented in Fig. 2.4 address 
these questions. The formulas given here cast popular indices of uneven distribution 
as differences of means on individual residential outcomes (y) that are scored on the 
basis of the pairwise group proportion (p) for the area of residence. These expres- 
sions are new to this monograph and have not been presented previously in the lit- 
erature on segregation measurement. 

These formulas play a crucial role in this study; they constitute the mathematical 
basis for what I term the “difference of means” framework for segregation measure- 
ment. Accordingly, I review these formulas in more detail in Chap. 3 and I also 
provide additional technical discussions and derivations as Appendices. I conclude 
this short chapter with a few additional comments. This chapter establishes the point 
that all popular indices of uneven distribution can be given in a variety of mathemat- 
ically equivalent formulations. Some are convenient for computing; some support 
attractive substantive interpretations; and some reveal how segregation is connected 
to residential outcomes for individuals and how these may differ across groups. All 
can be used to obtain correct values for index scores and thus they all are inter- 
changeable for that narrow purpose. The new formulas introduced in Fig. 2.4 defi- 
nitely can be used for this purpose. But that is not their main claim to fame. Their 
value to segregation research is that they provide unique advantages for segregation 
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Difference of Residential Outcome Scores (y) 
Group Means on y Assigned to Individuals Based on y; = f(pi) 
G = 100:2(¥%, — Y>) yi = f(pi) = relative rank (quantile scoring) on p; 


D = 100-(% —¥%) yi = f(p) = Oifpi<P,1ifpp>P 


Alternatively, compute D as a simplified version of G based on collapsing area 
values for p; into a two-category rank scheme consisting of areas where pi < P 
and areas where pi = P. 


A = No direct difference of group means solution is available but A = 2R—R? for the 
“symmetric” version of A (i.e., A when a=B=0.5). 


R = 100-(%) — Y2) yi = Q+ (1-—Vpiq:/PQ)/ (pi/P- q:/Q). 
H = 100 (7 — Y2) yi = Q + [(E—e:)/E] / (pi/P — q:/Q). 
S = 100% - ) yi = pi 


Fig. 2.4 Formulas casting values indices of uneven distribution as differences of group means 


( Y — Y, ) on individual residential outcomes (y) (Notes: Y, and Y, are group averages given by 


Y= (1/ N, )zy, and Y, = (1 IN, ) dy, with i denoting individuals in the relevant group p; 
denotes the pairwise area proportion for the reference group (p;) in the area where the i’th indi- 
vidual resides and y; is the residential outcome score generated by the index-specific scoring func- 
tion f(p,). See notes to Figs. 2.1 and 2.2 for other terms) 


measurement and new options for segregation analysis. They do so by placing all 
popular indices of uneven distribution in a common framework wherein all indices 
are given as group differences of means on individual residential outcomes (y) that 
are scored from the pairwise racial composition (p) of the area in which the indi- 
vidual resides. This framework provides a new basis for understanding, interpreting, 
and comparing familiar indices. It also opens the door to innovations in segregation 
measurement and analysis. I explore these possibilities in more detail in the remain- 
ing chapters of this monograph starting next with an overview to the “difference of 
means” framework. 
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Chapter 3 
Overview of the “Difference of Means” 
Framework 


The previous chapter notes that popular indices of uneven distribution can be 
expressed in a variety of mathematically equivalent ways. The discussion there (and 
in Appendices) reviews a variety of formulas presented previously in the literature. 
It also introduces a set of new formulas that cast indices of uneven distribution as 
group differences of means on individual residential outcomes. I argue that the 
group difference of means formulation is an important new approach that brings 
many advantages and possibilities to segregation measurement and analysis. To 
make the case for this view I now provide a more detailed discussion comparing 
standard computing formulas with the difference of means formulas. 


3.1 Index Formulas: The Current State of Affairs 


As I noted briefly in the previous chapter, popular measures of segregation such as 
the widely used dissimilarity or delta index (D) traditionally have been formulated 
and interpreted from a perspective that focuses attention on outcomes for areas 
rather than outcomes for individuals. For example, the following formula from 
Duncan and Duncan (1955: 211) highlights area differences in relative group pres- 
ence — specifically, the area’s share (s) of the group’s city-wide population — for the 
two groups in the comparison. This formula is widely used to compute D because it 
is computationally efficient and easy to implement. In addition, the focus on varia- 
tion in area outcomes is seen as an appealing basis for understanding and assessing 
the extent to which two groups are distributed unevenly across the residential areas 
of a city. 


D = 100-%5|(n,, /N,)—(n,,/N,) 
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D= 100-25 |s; -s,; 


This aggregate-level approach is not unique to the era in which Duncan and Duncan 
were writing or to the dissimilarity index. More than four decades later Hutchens 
introduced a new measure of uneven distribution termed the square root index (R) 
and drew on a similar formulation to clarify how R assesses the extent to which two 
groups are distributed unevenly across the residential areas of a city (Hutchens 
2001: 23). 


R = 100-(1.0-2 J(m, /N,)- (7 /N,)) 
R = 100-(1.0-¥ [sy sz ) 


In the formula for D, uneven distribution is assessed as 0 only when the “area share 
scores” (s) for the two groups in the comparison are exactly equal in all areas of the 
city. The same is true for the formula for R.! 

Summary measures of uneven distribution formulated in this way have been and 
remain valuable tools for aggregate-level description. But the focus on outcomes for 
areas rather than individuals and groups imposes a significant limitation that Duncan 
and Duncan (1955) noted over 50 years ago. The limitation is that area-oriented 
formulations of D and other indices provide little basis for gaining insight into how 
underlying micro-level social processes of residential attainment give rise to the 
area patterns that determine the level of residential segregation for the city. 
Accordingly, Duncan and Duncan stated “In none of the literature on segregation 
indices is there a suggestion about how to use them to study the process of segrega- 
tion or change in the segregation pattern” (1955: 223; emphasis in original). The 
process of course plays out at the level of individuals and households, not for areas. 
Indeed, the areas often are defined as statistical units with no intrinsic sociological 
qualities relevant for segregation process; they are merely useful constructs for 
assessing group differences in residential distribution. So formulas that focus atten- 
tion on outcomes for areas are at a level of abstraction removed from “where the 
action is” in segregation dynamics. 

Duncan and Duncan additionally noted it would be desirable, but was not then 
possible, to incorporate controls for the role of individual-level factors (e.g., labor 
force status, occupation, income, etc.) beyond race when seeking to understand and 
explain the level of segregation in a city. Unfortunately, efforts to achieve this goal 
were frustrated then and are currently frustrated now by thinking about segregation 
solely from the point of view of the area-oriented computing formulas given above. 
The formulas are framed in terms of outcomes for areas, not in terms of individual 


! This may be less obvious for R because it is a relatively recent addition to the literature. When 
share scores (s) for groups are equal in an area, the square root of the product of the two share 
scores will equal the value of the individual share scores. The resulting terms will sum to 1 when 
group share scores are equal in all areas and the index score will be 0. 
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residential outcomes. So it is no surprise that it is not easy to use them to gain 
insights into how index scores arise from an underlying micro-level process where 
potentially many factors play a role in shaping the residential outcomes individuals 
attain. 

When segregation when conceptualized and analyzed from the point of view of 
outcomes for areas, it is very difficult to take account of the role of even a single 
social or economic characteristic beyond race and it is completely infeasible to take 
account of the role of several social and economic characteristics at the same time. 
Past efforts to achieve the goal of controlling for the role of non-racial characteris- 
tics have been limited to computing index scores using group subsamples that are 
matched on one or more relevant social characteristics (e.g., income). This approach 
is untenable in practical application because analysis quickly comes to be based on 
very small subgroup counts if one measures non-racial characteristics in fine- 
grained ways and/or if one tries to control for more than one or two non-racial 
characteristics at the same time. Accordingly, the approach is used infrequently in 
the empirical literature. When it is used, implementations are crude and unsatisfying 
and the resulting index scores are likely to be problematic on technical grounds. The 
implementations are crude because fine-grained distinctions quickly lead to small 
subgroup counts. Consequently, “matching” on non-racial characteristics can at 
most involve one or two characteristics and an interval variable such as income must 
be grouped into very broad categories. Yet even with these compromises, subgroup 
counts wind up being much smaller than overall counts and this then leads to techni- 
cal problems relating to index bias, a concern I discuss in detail in Chaps. 14, 15, 
and 16. 

In short, it is a disappointing state of affairs. In the six decades that have passed 
since Duncan and Duncan raised these important and fundamental concerns, the 
problems they identified have yet to be adequately addressed. Researchers continue 
to formulate indices of uneven distribution from area-oriented perspectives that 
leave the connections between index scores and individual-level residential attain- 
ments, and the related micro-level processes that shape them, unspecified and poorly 
understood. As a consequence, research on residential segregation has become 
increasingly out of step with the broader literatures investigating racial and ethnic 
inequality and disparity in socioeconomic outcomes such as education, occupation, 
and income. Studies of racial and ethnic differences in other socioeconomic out- 
comes have for many decades routinely drawn on micro-level models of individual 
attainment to gain insights into how many different factors may contribute to the 
creation of aggregate-level (i.e., national- and community-level) group disparities. 
In contrast, the literature on segregation has had to limit its focus to assessing 
aggregate-level segregation leaving the implications for and connections to group 
differences in individual residential outcomes uncertain and unexamined. 

To be fair, a vibrant and important literature focusing on individual-level residen- 
tial attainment has emerged in recent decades (e.g., Alba and Logan 1993; Logan 
and Alba 1993; Logan et al. 1996; Alba et al. 1999; South and Crowder 1997, 1998). 
But it has developed as a separate literature that is only loosely connected with 
research investigating segregation at the aggregate-level. The reason for this is that 
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the dependent variables in analyses of individual residential attainment do not cor- 
respond to terms that figure directly in the calculation of segregation index scores. 
Accordingly, studies of individual residential attainments to date do not, and logi- 
cally cannot, provide direct insights into the values of D or other aggregate-level 
summary indices of uneven distribution. Conversely, studies of aggregate-level seg- 
regation cannot directly provide insights into the parameters of individual-level 
residential attainment processes. 

The current state of affairs is unfortunate and unsatisfactory. Interest in segrega- 
tion generally rests on an implicit assumption that segregation has important asso- 
ciations with group differences on neighborhood residential outcomes that are 
relevant for socioeconomic attainment and inequality in life chances. Individuals 
and households strive to attain these residential outcomes either for their own sake 
(e.g., as markers of social position) or because they are closely correlated with fac- 
tors that impact life chances (e.g., exposure to crime, social problems, schools, ser- 
vices, neighborhood amenities, etc.). In view of this, it is clearly desirable to gain a 
better understanding of how different segregation indices relate to group differences 
on individual-level residential outcomes. Surprisingly, the methodological literature 
on segregation measurement is nearly silent on this issue. Segregation measurement 
theory gives attention to many properties and qualities of aggregate-level indices 
but it has not taken up the question of how different indices relate to individual-level 
residential outcomes or carry different implications for group differences on resi- 
dential outcomes. 


3.2 The Difference of Means Formulation — The General 
Approach 


I address this gap in the measurement literature by casting popular measures of 
uneven distribution as differences of group means on segregation-relevant individ- 
ual residential outcomes. Specifically, I place familiar segregation indices in a com- 
mon “difference of means” framework in which the index score “S” is given as 


S = Y,-Y, 


where: 


S is the score of the relevant segregation index (i.e., G, D, R, H, or S), 

Yı is the mean on y for individuals in Group 1 based on either (1 IN, pe n,,y; when 
computed for area data or (1 IN ) -Xy,; when computed for individual data, 

Y2 is the mean on y for individuals in Group 2 based on either (1 IN, ) -X n, y; when 
computed for area data or (1 IN, ) -Xy,,; when computed for individual data, 

nı; and nz are the counts of Groups | and 2, respectively, in the i’th area, 

pi is the pairwise area proportion for Group 1 in the ith area based on 


Pi =Ni /(n; +n,;), 
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y; is the residential outcome score (y) for the i’th area scored as a function of the 
pairwise area group proportion y, =f (p; ), 

Yıx indicates the residential outcome (y) for the k’th individual in Group 1 (set equal 
to the residential outcome score for the area in which the individual resides), and 

Yx indicates the residential outcome (y) for the k’th individual in Group 2 (set equal 
to the residential outcome score for the area in which the individual resides). 


I hold that formulating segregation indices in this way is useful for both concep- 
tual and practical reasons. First, it provides a new interpretation for aggregate seg- 
regation indices; they now can be understood as registering simple group differences 
on residential outcomes (y) scored based on area group proportion (p) which has an 
easy, straightforward interpretation as (pairwise) contact with or exposure to Group 
1 (we., the reference group) based on co-residence. Simple co-residence, of course, 
does not necessarily imply harmonious social interaction. But it does indicate com- 
mon fate regarding many neighborhood outcomes and many shared residential 
experiences. On this basis, it is a potentially important and meaningful social 
indicator. 

Second, this new approach to computing index values places different indices in 
a uniform, common computing framework that highlights differences between mea- 
sures on a single, specific point of comparison — the manner in which each index 
registers neighborhood residential outcomes (y) based on area group proportion (p). 
Since area group proportion can be understood as contact or exposure based on co- 
residence with Group 1, all of the indices can be interpreted as group differences in 
average “scaled contact” with Group 1. Differences between indices ultimately 
trace to differences in the specific way that residential outcomes (y) are quantita- 
tively scored based on area group proportion (p). Consequently, differences between 
indices can be seen as arising solely from differences in the index-specific form of 
the scaling function y=f (p). This provides a new basis for evaluating segregation 
indices; they can be compared on the substantive relevance of how each index reg- 
isters residential outcomes (y) based on contact and exposure with Group 1 as 
embodied by area group proportion (p). 

Third, the segregation-relevant residential outcomes (y) used to compute the seg- 
regation index score can directly serve as dependent variables in individual-level 
residential attainment analyses. Thus, in the difference of means formulation, the 
segregation index score can be equated to the effect of group membership (e.g., 
coded 0 or 1) in an individual-level residential attainment analysis for the city. This 
carries minimal practical value for specific task of estimating index scores because 
the scores can be readily obtained by simpler methods. But it is important because 
it expands options for understanding and analyzing segregation. It unifies the study 
of aggregate segregation with the study of residential attainment in a single frame- 
work. In doing so it opens the door to a host of new options for segregation analysis 
including, for example, the ability to easily take account of the role that factors other 
than group membership (e.g., income) may play in determining segregation and the 
ability to use multi-level models of residential attainment to study cross-area and 
cross-time variation in segregation. 
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3.3 Additional Preliminary Remarks on Implementation 


The key to implementing the new approach is to identify for each index a scoring 
system for neighborhood outcomes (y) that will yield the segregation index score as 
a difference of group means (Y, — Y, ). [have identified relevant scoring systems for 
five indices that are widely used to measure the unevenness dimension: the gini 
index (G), the delta or dissimilarity index (D), the separation index (S) (also known 
as the variance ratio index [V]), the Theil entropy index (H), and the Hutchens 
square root index (R), a measure that is closely associated with the “symmetric” 
implementation of the Atkinson index (A).?? 

For all of these indices, the residential outcome (y) is scored as a function of 
“pairwise” group proportion (p) for the area the individual resides in. Indexing areas 
by “1”, pi is given as 


Pi =1,; /(n,; +n, ) 


where p; is the Group 1 proportion in the combined population of Group 1 and 
Group 2 in the i’th area. In this formulation the scoring system for each index rests 
on a “scaling” function y=f (p) that maps area group proportion scores (p) on to 
index-specific residential outcome scores (y). I discuss the index-specific scaling 
functions y=f (p) in the chapters that follow and in Appendices. 

Before continuing, I comment briefly to note two technical points. One is that the 
designation of which group serves as “Group 1” is arbitrary. One group must be so 
designated. But the result for the index score will be the same regardless of which 
of the two groups is chosen as the reference. White (1986) termed this index prop- 
erty as “symmetry.” When one group is understood as a majority group and the other 
as a minority group, it has been conventional in previous research to designate the 
majority group as Group 1. This is not required, but it is convenient because it facili- 
tates interpreting segregation as reflecting the extent to which the minority group 
has less contact with the majority group than would occur under even distribution. 
This has generally been viewed as useful based on the assumption that areas of 
majority group residence are advantaged and thus disparity and disadvantage in 
residential outcomes follows when contact with the majority falls below parity. But 
it is only a custom, not a logical requirement. If the roles of the two groups are 


? The separation index (S) is known by many names including the revised index of isolation (Bell 
1954), the correlation ratio and eta squared (747 ) (Duncan and Duncan 1955), r or rj (Coleman 
et al. 1975, 1982), variance ratio (V) (James and Taeuber 1985), and segregation index (S) (Zoloth 
1976). 

3] note below that A is an exact nonlinear function of R. To date I have not identified the relevant 
scoring system for Atkinson’s Index (A). A is rarely used in empirical studies and has been criti- 
cized on conceptual grounds for being asymmetric such that A for White-Black segregation may 
differ from A for Black-White segregation (e.g., White 1986). But it has received attention in the 
segregation measurement literature. So this remains a task for future research. 
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reversed, the contact interpretation will be reversed. But all substantive implications 
of the patterns of group differences in contact will remain intact and unchanged. 

The second technical point I mention is that p is computed using only counts for 
the two groups in the segregation comparison. Thus, p is not Group 1’s proportion 
among the total population of the area; it is Group |’s proportion among the com- 
bined count of the two groups in the segregation analysis. To emphasize this point, 
I sometimes term p as a “pairwise” group proportion. However, as this is the pri- 
mary way I use p in this monograph, I often drop the “pairwise” modifier in the 
interest of economy of expression. Note that this “pairwise” construction is not at 
all controversial in segregation measurement; relevant terms in all of the standard 
formulas for measures of uneven distribution reviewed earlier are based on pairwise 
implementations of group proportions for areas (i.e., p and q) and for the city as a 
whole (i.e., P and Q). 

The general outline of the approach is now set. The next task is to review how the 
difference of means framework can be implemented with the most popular and 
widely used segregation indices. 
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Chapter 4 
Difference of Means Formulations for Selected 
Indices 


In this chapter I review the implementation of the difference of means framework 
for calculating indices of uneven distribution for five indices: the delta or dissimilar- 
ity index (D), the gini index (G), the separation index (S), the Theil entropy index 
(H), and the Hutchens square root index (R). For each index I introduce the relevant 
scoring systems for residential outcomes (y) that makes it possible to obtain the 
index scores as simple differences of group means on individual residential out- 
comes (y). To facilitate discussion, I replace the abstract terms “Group 1” and 
“Group 2” with the more concrete example of Whites and Blacks which has been 
investigated in hundreds of empirical analyses of uneven distribution in U.S. cities, 
urban areas, and metropolitan areas. As I move from index to index, I provide com- 
ments on the nature of the scaling function that maps scores for contact and expo- 
sure based on pairwise area proportion White (p) onto index-specific residential 
outcome scores (y). In addition, I sometimes offer commentary on the index. Note, 
however, that I do not provide a comprehensive review of the five indices because 
this task has been addressed previously in the existing literature and does not need 
to be repeated here. 


4.1 Scoring Residential Outcomes (y) for the Delta or 
Dissimilarity Index (D) 


I begin with the delta or dissimilarity index (D) because it is by far the most widely 
used index of uneven distribution. I review two scoring schemes for the function 
y=f (p) for the delta index (D). One is based on interpreting D as a crude variant 
of the gini index (G). I discuss this below after I have introduced and reviewed the 
scoring scheme for G. First, however, I review a scoring scheme for D that is espe- 
cially simple, easy to explain, and attractive on substantive grounds. In this scheme 
D is obtained as a difference of group means (Y,—Y,) based on assigning residential 
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outcomes (y) for individuals a value of either 0 or 1 based on whether area propor- 
tion White (p) for their area of residence equals or exceeds proportion White for the 
city (P).! Thus, the relevant scaling function y =f (p) for D is a monotonic, binary 
step function where y=1 when p> P and 0 otherwise (i.e., when p<P), where 
p=n, /(n, +n,) and P=N, /(N, + N,) per expressions introduced earlier with 
counts for Whites being used for Group 1 (the reference group) and counts for 
Blacks being used for Group 2 (the comparison group). 

I review the underlying formal basis for this scoring of residential outcomes 
in Appendices. The material also provides detailed discussions establishing the 
formal basis for scoring function y =f (p) for all indices considered in the body 
of this paper. The discussions are mostly dry and tedious. But I encourage inter- 
ested readers to review the discussions to verify the basis for the scoring func- 
tions and to gain additional insights into the underlying nature of different 
indices. 

The scoring of residential outcomes as either 0 or 1 based on whether area pro- 
portion White (p) equals or exceeds the city mean (P) supports a simple, straightfor- 
ward substantive interpretation of D in terms of group differences in “exposure” and 
“contact”. Specifically, D can be understood as the White-Black difference in the 
proportion in each group that experiences “parity” in (pairwise) contact with Whites. 
Parity here is equated to attaining at least the level of (pairwise) proportion White 
seen for the city overall. Noting this substantive interpretation for D introduces a 
theme that will recur throughout this chapter. It is that: 


D and all other popular indices of uneven distribution can be interpreted as measures of 
simple group differences on residential outcomes of scaled group “exposure” or “contact” 
for individuals. 


The contact interpretation of D is simple and easy to grasp. In light of this, it is 
surprising that it is so infrequently discussed in the broader literature. Instead, it is 
much more common for the substantive interpretation of D to be framed in terms of 
the extensiveness of group displacement from even distribution based on “volume 
of group movement”. In this interpretation D indicates “the minimum proportion of 
one group that would have to move to a new area to bring about even distribution.”” 
This interpretation of D is useful for some purposes and it is often seen as an inter- 
pretation that is easy to convey to broad audiences. For example, it is relevant for 
policy analysis assessing consequences of segregation in terms of the “disruption” 
in residential patterns (or school attendance patterns) that would result if policies 


! The same result is obtained if y is set to 1 when contact exceeds parity (i.e., p>P). This is because 
assigning 0 or 1 in cases where p = P will shift the mean for both groups by the same amount and 
thus have not impact on the index score. 

? This interpretation rests on the assumption that only one group relocates (Zelder 1977). Minimum 


“volume of movement” requirements can be quite different if members of both groups exchange 
residential locations. 
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promoting integration were implemented.’ But the group difference of means inter- 
pretation of D also is useful, very easy to compute, and very easy to convey to broad 
audiences. So these are not decisive factors for the neglect of this straightforward 
contact interpretation of D. 

Regardless of what factor(s) account for it, the lack of attention given to the dif- 
ference of mean contact interpretation for D has an unwelcome consequence. It has 
led researchers to be less familiar with an important property of D; namely, that D 
is inherently insensitive to, and conveys very little information about, the quantita- 
tive magnitude of group differences on residential outcomes. For example, a high 
value of D means that a higher proportion of Whites than Blacks live in areas that 
attain parity with the city-wide level of proportion White. But the high value does 
not, and it inherently cannot, signal whether the kinds of areas that Whites and 
Blacks typically live in are relatively similar on proportion White or fundamentally 
different. Relatedly, while the value of D signals the minimum proportion of one 
group that will need to move to eliminate uneven distribution, it provides little 
insight into the changes in neighborhood outcomes that would result for the two 
groups in the comparison. 


The value of D does not indicate whether movement to bring about even distribution will 
lead to substantively important changes in the residential outcomes for the individuals in 
either group. Specifically, it does not indicate whether movement will bring about socio- 
logically meaningful changes in neighborhood racial composition. 


This is not a trivial concern. Accordingly I give it extended attention at several 
other points in this monograph as well as here. The reason it is not trivial can be put 
in simple terms. It is logically possible for D to take high values when Whites and 
Blacks live in areas that are fundamentally similar on area proportion White. In this 
circumstance, residential redistribution leading to integration would indeed require 
a high proportion of one group to move, but the movement will not lead to important 
changes in their residential outcomes or in their comparison on these outcomes with 
the other group. This important possibility appears not to be widely recognized and 
appreciated by segregation researchers. It is safe to say it is almost never recognized 
by broader consumers of segregation research. 

In my experience non-specialists and researchers alike overwhelmingly interpret 
D in a way that is oblivious to this quality and as a result leads them to be prone to 
make mistaken inferences about the nature of segregation. Specifically, researchers 
as well as non-specialists are prone to assume that high values of D necessarily 
indicate that most members of both of the groups in question live apart from each 
other in areas where their group predominates and as a result elimination of uneven 
distribution will lead to important changes in racial mix and associated residential 
outcomes for at least one group. This, of course, is sometimes the case. But it is 
important to recognize that it is not necessarily the case. Furthermore, this latter 
outcome is not an esoteric or unusual hypothetical possibility that can be safely 


3 For this concern, the replacement index might be the better choice as it assesses the minimum of 
overall population movement required to bring about even distribution (Farley and Taeuber 1974). 
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ignored. To the contrary, as I show in empirical analyses I review in Chap. 6, 
instances where high values of D occur but both groups on average live in neighbor- 
hoods that are fundamentally similar on neighborhood outcomes can be found with 
surprising frequency when one systematically examines group differences in resi- 
dential distribution in detail. 

It may be helpful to make the issue more concrete by considering an example 
where inattention to this issue can lead to an incomplete and potentially misleading 
understanding of segregation patterns. A relevant case is the comparison of White- 
Black segregation and White-Asian segregation. Studies generally find that, on 
average, D for White-Black segregation is higher than D for White-Asian segrega- 
tion. I find a similar result based on analysis of block-level data for core-based sta- 
tistical areas (CBSAs) in 1990, 2000, and 2010 with the median value of D being 
71.8 for White-Black segregation and 62.8 for White-Asian segregation.* The dif- 
ference of 9.0 points is relatively modest and suggests that, while White-Asian seg- 
regation is appreciably lower than White-Black segregation, White-Asian 
segregation still should be seen as fairly high. 

With values of D being so high for both comparisons, one might assume that 
Black and Asian residential outcomes would be relatively similar and that most 
members of both groups would tend to reside in areas where their group predomi- 
nates and not with Whites. But this is not the case at all. Blacks consistently reside 
in areas where Blacks predominate; across the CBSAs in the full data set the median 
for Black (pairwise) contact with Whites is 43.1 % and the median for Black (pair- 
wise) contact with Blacks is 56.9 %. In contrast, Asians rarely reside in areas where 
Asians predominate; across CBSAs the median for Asian (pairwise) contact with 
Whites is 83.4 % and the median for Asian (pairwise) contact with Asians is 16.6 %. 
These results indicate that White-Black segregation is quantitatively fundamentally 
different from White-Asian segregation even when they have similar values on 
D. The results for D do not suggest this, but results for the separation index (S) — an 
alternative measure of uneven distribution I discuss in more detail below — do signal 
that the group comparisons are very different. In the same analysis I found the 
median value for the separation index was 48.3 for White-Black segregation and 
13.8 for White-Asian segregation. The difference in the two segregation compari- 
sons is much more dramatic using S; the typical level of S for White-Black segrega- 
tion is three times the typical level of S for White-Asian segregation. This example 
highlights that D is insensitive to an important aspect of segregation; namely, group 
residential separation and neighborhood racial polarization. I review this issue in 
more detail in the next chapter. 


‘The full data set of segregation scores is discussed in more detail in the next chapter of this mono- 
graph. CBSAs areas are included in the analysis if the size of the smaller group is 2,500 or higher. 
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4.2 Scoring Residential Outcomes (y) for the Gini Index (G) 


For the gini index (G), the relevant function y =f (p) for scoring residential out- 
comes (y) so G can be obtained from the difference of group means (Y;—Y>) is 
based on relative rank position — that is, quantile or percentile standing — on area 
group proportion (p). Specifically, y=percentile scores based on p. This makes y an 
ever-rising, monotonic, nonlinear function of p. The percentile scores can be 
obtained as follows. Rank areas from low to high on p. Assign the first area (1.e., 
i=1) the percentile score y= 100-(%t,)/T. Assign the remaining areas (1.e., 
areas i= 2,3,4...1) percentile scores based on y= 100-(=t,_, +t.) /T. Under 
this scoring system, G/2 can be obtained from Y,- Y, . Alternatively, G can be 
obtained from 2- (Y, - Y,) . I review the basis for these expressions in Appendices. 

The fact that G can be obtained from percentile scoring of area group proportions 
results because G, when applied using its formulation as a segregation index, is a 
measure of ordinal or “rank order” inequality between groups. I have previously 
described this property of G in Fossett and Siebert (1997) and earlier, albeit less 
directly, in Fossett and South (1983). In Fossett and Siebert (1997: Appendix A) I 
note that G is equivalent to familiar indices of ordinal inequality and ordinal asso- 
ciation. Specifically, G is mathematically equivalent to Lieberson’s (1976) index of 
net difference (ND), a measure of ordinal inequality between groups and G also is 
equivalent to Somers’ (1962) d,,, an index of ordinal association. Based on this 
Fossett and Siebert (1997) show that G can be given as 


G =100-22y; (n; /N,)-(n,,/N,) 
or, more compactly, 
G=100-22y; (su *s3;) 


where i and j index areas ranked for low to high on area proportion White (p) and y; 
is scored —1 if i< j, 0 if i=j, and 1 if i> j and sj; and szj are group share scores 
given by s; = (n; /N,) 85; = (n,, /N,). This formula reveals that G registers only 
rank position on p and does not register the size of the quantitative differences 
involved. 

The difference of means formulation of G supports and clarifies the interpreta- 
tion of G as a measure of group difference on scaled “exposure” and “contact”. In 
the case of White-Black segregation, G is the White-Black difference in average 
relative rank position on contact with Whites (p).> Alternatively, G is the White- 
Black difference in exposure to area percentile rank on proportion White. G’s equiv- 
alence to the index of net difference supports a related interpretation. G indicates the 
difference between two probabilities for rank order comparisons of individual 


‘More carefully, G is twice the difference. Or, equivalently, G is obtained by expressing the 
observed difference as a percentage of its maximum possible value (which is 0.50). 
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Whites and Blacks on area proportion White. The first, P(A), is the probability that 
a randomly selected White will live in an area where proportion White is higher 
than that for a randomly selected Black. The second, P(B), is the probability that a 
randomly selected White will live in an area where proportion White is lower than 
that for a randomly selected Black.° The value of G is given by P(A)—P(B) where 
P(A) =2X,_,Y, and P(B) =2ZX,Y;,, with X and Y denoting cumulative group 
proportions over areas ranked from low to high on area proportion White. This 
expands to 100- (2x, = EX, Y) , the formula for G from Duncan and Duncan 
(1955) given in Fig. 2.1. The main value of the net difference interpretation origi- 
nally introduced by Lieberson (1976) is to drive home the point that G is a measure 
of inter-group rank order inequality on the residential outcome of area proportion 
White (p). 

It is useful to briefly contrast G with D. Unlike D, G satisfies the principle of 
transfers and on this basis is technically superior to D.” But G is similar to D in 
being unable to give a reliable signal about group separation and residential polar- 
ization. The reason for this is that G can take high values when the two groups in the 
comparison have similar distributions on area group composition. This is possible 
because G registers rank-order differences on area proportion White and will regis- 
ter such differences equally regardless of whether the quantitative differences on 
area proportion White are small or large. As a result, when on sees a high value on 
G, it is impossible to know whether the underlying pattern of segregation involves 
extensive group separation and extreme neighborhood racial polarization such as 
that observed for White-Black segregation in Chicago or involves a more benign 
pattern with minimal group separation and fundamentally similar neighborhood 
fate. 


4.3 The Delta or Dissimilarity Index (D) as a Crude 
Version of G 


The index of dissimilarity (D) can be understood as a special case of the gini index 
(G). Specifically, D is equivalent to G when areas are ranked using a two-category 
scheme based on whether p; < P or p; 2P , rather than being ranked on the full 
range of scores on p; as would be the case with G. Thus, D is a version of G com- 
puted when areas are grouped into two categories based on whether or not they are 
at or above average on proportion White. Accordingly, D can be obtained using the 
formula for G after ranking areas on the basis of a two-value recoding of p; as either 
1, when p; 2P , or 0 otherwise. These recoded values of p; are then used to score y 
in terms of relative rank position on area group proportion as described above for G. 


ĉIt is also possible for Whites and Blacks to tie when compared on area proportion White. But this 
probability need not be computed as it does not directly determine the index score. 


’The principle of transfers is discussed in James and Taeuber (1985) and Reardon and Firebaugh 
(2002). 
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Accordingly, the value of D/2 is given by Y, —Y,, or alternatively, D is given by 
2: (Y; -= Y,) . If one were graphing the segregation curves associated with D and G 
for a given comparison, G would produce a conventional segregation curve the data 
for D would produce a segregation curve in the form of a triangle. The two curves 
would share three points. The two end points of the curve at (0,0) and (1,1) and one 
point along the curve at (X,Y) where the values of X and Y are equal to the propor- 
tion of Blacks and Whites, respectively, living in areas where p< P. 

This provides insight into why D and G are highly correlated and why scores for 
D never exceed scores for G (i.e., D < G ). Both measures register White-Black dif- 
ferences in relative rank position on p;. However, G registers all rank differences on 
pi while D registers only rank differences where group comparisons on p; are on 
opposite sides of P. This accounts for the difference between D and G in how they 
respond to population transfers or exchanges. G will register any transfer or 
exchange that affects at least one household’s rank position on area proportion 
White (p). D will register a transfer and exchange only if it causes the value of p for 
at least one household to shift from p< P to p2P or vice versa. 


4.4 Scoring Residential Outcomes (y) for the Separation 
Index (S) 


I use the term Separation Index (S) to refer to a measure that has been known by 
many names over the decades. A partial list of past names includes: the correlation 
ratio and eta squared (n?) (Duncan and Duncan 1955; Stearns and Logan 1986; 
Iceland et al. 2002), r or rj (Coleman et al. 1975, 1982), the variance ratio (V) 
(James and Taeuber 1985), and segregation index (S) (Coleman et al. 1966; Zoloth 
1976).° I term this measure the separation index because a high value on this index 
gives a clear and reliable signal that the two groups in the comparison are residen- 
tially separated and generally do not reside in the same areas.’ That is, it indicates 
whether the two groups live apart from each other due to being concentrated in 
areas that are racially polarized in a pattern of “prototypical” segregation wherein, 
in the example of White-Black segregation, Whites live in predominantly White 
areas and Blacks live in predominantly Black areas. I clarify the basis for this claim 
in more detail shortly. 

For the separation index (S), the relevant function y = f (p) for scoring residen- 
tial outcomes (y) so S can be obtained from ( Y, — Y, ) is quite simple; it is the 


8 Additionally, S is a special case of Bell’s (1954) revised index of isolation for the situation in 
which the population has only two groups. 

° As used here, the term separation does not imply that the groups live in areas that are far apart in 
distance. It implies only that they are residentially separated into distinctly different areas. These 
can be far apart but they also can be adjoining as standard implementations of all measures of 
uneven distribution are “aspatial” in that the arrangements of units in space does not affect index 
values. 


34 4 Difference of Means Formulations for Selected Indices 


identify function y=p. I review the formal basis for this scoring of residential 
outcomes for S in Appendices.'® The scaling function used for S is distinct from 
those that are used for other popular indices of uneven distribution. It maps the con- 
tact score (p) directly onto residential outcome scores (y) based on a one-to-one 
linear relationship. In contrast, the scaling functions for all other indices map the 
contact score (p) onto residential outcome scores (y) based on some form of posi- 
tive, monotonic, nonlinear relationship. 

The separation index supports a clear and appealing interpretation based on pair- 
wise “exposure” and “contact”. In the case of White-Black segregation, S is the 
White-Black difference on average contact with Whites (p). From this vantage 
point, it becomes clear why it is appropriate to refer to this measure as the “separa- 
tion index”. The White-Black difference in contact with Whites can be large only if 
Whites live separately from — that is, apart from, not with — Blacks in neighbor- 
hoods that are predominantly White and Blacks live separately from Whites in areas 
that are predominantly Black. To clarify, in most applications indices of uneven 
distribution are implemented as “aspatial” measures. In this application, the notion 
of separation implies only that the groups live in different areas. It does not imply 
that the different areas are necessarily spatially distant from each other. This would 
be the case when segregation involves large-scale clustering. But the index score 
would be the same if Whites and Blacks lived separately from each other in different 
areas forming a checker board pattern. 

The separation index also could be aptly termed the “contact difference” index, 
but that is a bit cumbersome. Alternatively, it could be named the “concentration” 
index following Stearns and Logan (1986), but Massey and Denton (1988) popular- 
ized the term “concentration” in association with another distinct dimension of seg- 
regation. So I adopt the term “separation index” (S) which emphasizes that the 
measure is sensitive to whether groups live apart from each other and are separated 
into different areas that differ fundamentally on group composition. 

The notion of group separation is closely connected with the notion of area or 
neighborhood racial polarization discussed by Stearns and Logan (1986).!! As they 
used it, polarization is high when the areas in which the two groups live fall primar- 
ily into two types. In the case of White-Black segregation that would be either pre- 
dominantly White or predominantly Black with few areas in between. Their usage 
of the term polarization directs attention to a neighborhood outcome. But polariza- 
tion of neighborhood racial composition has obvious implications for group differ- 
ences on residential outcomes for individuals. When areas are racially polarized, 
individuals in both groups primarily live in neighborhoods where members of their 


10] derived this relationship independently. But I later discovered that the relationship had been 
reported, based on a different derivation, in a paper by Becker et al. (1978) that unfortunately is not 
widely known or referenced. 

'! Stearns and Logan also used the term “concentration” to describe this aspect of uneven distribu- 
tion. It is an appealing term, but I use “polarization” instead because Stearns and Logan use it as a 
synonym for concentration and because the influential methodological study by Massey and 
Denton (1988) used the term “concentration” to refer to a different aspect of segregation (relating 
to concentration in physical space). 
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group predominate. In the example of White-Black segregation, Whites live primar- 
ily in White neighborhoods and Blacks live primarily in Black neighborhoods. This 
resonates with the idea that groups live separate and apart from each other, a neces- 
sary, but not sufficient, precondition for experiencing disparities on neighborhood 
residential outcomes other than racial composition per se (e.g., crime, social disor- 
der, inferior amenities, poor schools, and poor government services, etc.). Given 
this close similarity of group separation and neighborhood polarization, it would not 
be unreasonable to call the separation index the “polarization” index. But I reserve 
that term for an alternative measure which I will introduce and discuss shortly. 

As noted earlier, I endorse Stearns and Logan’s view that the separation index (S) 
taps an aspect of uneven distribution that is sociologically important and is not con- 
sistently captured by other measures. In particular, the presence or absence of group 
separation is not captured well by the more widely used delta or dissimilarity index 
(D). It is interesting then to note that D is used much more widely than S. To be sure, 
the separation index has been used in segregation studies for many decades — for 
example, it was given close attention in Duncan and Duncan’s (1955) landmark 
article on segregation indices.!* Moreover, it consistently receives high marks in 
technical reviews of indices (e.g., Zoloth 1976; White 1986; Reardon and Firebaugh 
2002). Additionally it has been shown to be far less susceptible than D to the vexing 
problem of index bias (Winship 1977).'° Nonetheless, S is not used nearly as widely 
as D in empirical studies and its attractive qualities appear not to be widely appreci- 
ated. What could explain this? 

At least three factors appear to be relevant. One is that the measure has never 
been consistently used under the same name and interpreted in a consistent way. 
This alone is likely to lead many people to underestimate both the frequency of its 
usage and the extent to which different researchers have endorsed its value for 
assessing segregation. 

Another factor is that much of the usage of the index has involved terminology 
and interpretations that do not highlight what I view as the separation index’s stron- 
gest feature for substantive interpretation. For example, the measure has been used 
most widely under the names “variance ratio”, “correlation ratio”, and “eta squared” 
in the literature. These names are not technically incorrect or inappropriate. But 
they also do not call attention to the measure’s most attractive characteristic — its 
ability to signal when group residential distributions are polarized such that the two 
groups live separately from each other with members of both groups living primar- 
ily in areas where their group predominates. Instead, the names used in the past 
attention toward substantive interpretations relating to the strength of the 
individual-level, statistical association between the binary variable of race (coded 


Duncan and Duncan referred to it as eta squared and the correlation ratio. The measure also is 
discussed in Bell (1954), but the application there is to overall isolation instead of pairwise group 
comparisons. 

13In Chap. 14 I amplify Winship’s early finding by showing that among all popular indices of 
uneven distribution S is least susceptible to distortion by index bias while G and D are the most 
susceptible. 
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0-1) and the categorical variable of area of residence. These “statistical” interpreta- 
tions are mathematically defensible, but they do not resonate with broad audiences 
and researchers perhaps because their substantive relevance for group differences in 
residential outcomes is neither obvious nor easy to convey.'* 

The difference of group means formulation of the separation index can poten- 
tially address these two points and enhance the attractiveness of S to researchers and 
broader audiences. The computation of S under the difference of means formulation 
is simple and easy to implement. In addition, this formulation of S has an appealing 
substantive interpretation that is easy to convey to both broad and technical audi- 
ences; it signals that uneven distribution involves groups residing separately from 
each other with both groups being disproportionately concentrated in racially polar- 
ized neighborhoods such that the two groups experience fundamentally different 
residential outcomes on area racial composition. Importantly, D, the most widely 
used index of uneven distribution does not provide a reliable signal for whether or 
not this pattern of segregation is present. 


4.5 A Side Comment on the Separation Index (S) 
and Uneven Distribution 


A third factor that may help explain why the separation index has not been used 
more widely requires a longer discussion. It is that S is occasionally viewed as a 
measure of group isolation and exposure rather than a measure of uneven distribu- 
tion. At one level I view the controversy as minor because most technical reviews 
correctly characterize the separation index as a measure of uneven distribution (e.g., 
Zoloth 1976; James and Taeuber 1985; White 1986; Reardon and Firebaugh 2002). 
But there are contrasting descriptions of S in the literature so the issue warrants a 
brief side discussion. 

Massey and Denton (1988) categorize S (which they refer to as V and eta 
squared) as an “exposure” measure rather than a measure of uneven distribution. 
One reason they offer for doing so is that, unlike D and G, S does not have a definite 
relationship to the segregation curve. This concern should be set aside for two rea- 
sons. The first reason is that Massey and Denton themselves do not apply this crite- 
rion in a consistent way. For example, they classify the Theil entropy index (H) as 
an index of uneven distribution but, like S, H also does not have a definite relation- 
ship with the segregation curve. 

The second reason to set aside this concern is that many authoritative reviews of 
measures of uneven distribution disregard the segregation curve when evaluating 
indices (e.g., Zoloth 1976; Stearns and Logan 1986; Reardon and Firebaugh 2002). 


'4§ is equal to the eta squared (N° ) statistic from an individual-level analysis of variance predict- 
ing the mean of the binary variable of race (0-1) by area of residence (e.g., the categorical variable 
of tract). Relatedly, S is equal to the square of the individual-level correlation between race (coded 
0-1) and p (computed for area of residence). 
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Some important statements explicitly and forcefully dismiss the relevance of the 
segregation curve altogether (White 1986; Coleman et al. 1982). I endorse these 
views. I recognize that the segregation curve is a visually appealing and is poten- 
tially a useful graphical tool for depicting group differences in residential distribu- 
tion. But it does it not embody “the” definitive definition of uneven distribution and 
it also has clear limitations and deficiencies. For example, the segregation curve 
does not, and it logically cannot, signal whether the two groups in the comparison 
live apart from each other in areas that differ in substantively important ways on 
area racial composition. Accordingly, most authoritative methodological reviews 
classify both H and S as valid measures of uneven distribution often noting features 
of these indices that make them attractive for many purposes (e.g., Zoloth 1976; 
James and Taeuber 1985; White 1986; Reardon and Firebaugh 2002). 

Another possible basis for Massey and Denton’s characterization of S as an 
exposure index is that, under certain circumstances, particular computing formula 
for S contain terms that are similar to terms found in formulas for exposure indexes. 
For example, many have noted that S has similarities to Bell’s (1954) revised index 
of isolation which involves terms that have exposure interpretations (e.g., Duncan 
and Duncan 1955; Becker, McPartland, and Thomas 1978; Iceland et al. 2002; 
James and Taeuber 1985: footnote 4; Stearns and Logan 1986). In the final analysis, 
however, it is clear that there are fundamental logical differences between S and 
exposure and isolation indices. 

The first important logical difference is the population comparison involved. 
Exposure and isolation indices are calculated by comparing group counts to counts 
for the full population, not just the two groups in the segregation comparison. In 
contrast, S, like other indices of uneven distribution, is calculated from “pairwise” 
counts; that is, it is calculated using only the counts for the two groups in the com- 
parison and is unaffected by the counts for other groups. The distinction can be 
crucially important in empirical applications because scores and substantive impli- 
cations of “overall” and “pairwise” isolation can be and often are quite different. 

The second important logical difference that distinguishes S from pairwise isola- 
tion indices is that the pairwise isolation term incorporated in some computing for- 
mulas for S is modified by a “normalizing” calculation. This calculation is crucially 
important to the issue at hand because it can and often does radically change its 
value. Equally importantly, the normalizing calculation also fundamentally changes 
the substantive interpretation of S. Specifically, S does not register the level of pair- 
wise isolation. It registers something distinctly different; S registers the relative 
extent to which pair-wise isolation exceeds its expected value. This is fundamentally 
different from pair-wise isolation itself. The normalizing calculation in this particu- 
lar formulation of S has a crucially important consequence; it eliminates the 
mathematical correspondence between isolation and group composition. Thus, 
while group composition has important implications for the value of pair-wise isola- 
tion scores, it has no necessary or mathematically inherent implication for S. As a 
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result, S can take any value over its logical range of O to 1 under any arrangement 
on group composition for the city. This is not the case for measures of isolation. 
They must take high values when the group in question is large in relative terms. 

In sum, S is fundamentally distinct from standard indices of isolation and expo- 
sure. Isolation terms found in some formulas for computing S are based on pairwise 
counts, not overall counts, and they are subject to a normalizing transformation that 
radically changes their value and eliminates any mathematical correspondence 
between city ethnic composition and the value of S. Consequently, one cannot reli- 
ably infer the value of either overall or pairwise isolation from knowledge of the 
score of S or vice versa. 

Given the confusion in the literature on this issue, it may be useful to consider the 
hypothetical example of a population with three groups — Whites, Blacks, and 
Latinos. Then assume that Whites live apart from Blacks and Latinos but that Blacks 
and Latinos live together. S will register the pattern of uneven distribution as high 
for both White-Black and White-Latino segregation and low for Black-Latino seg- 
regation. Importantly, this result will be the same regardless of the city racial com- 
position. In contrast, both overall isolation and also Bell’s revised index of isolation 
will vary depending on city racial mix. The revised index of isolation for Blacks will 
be higher in a city where Whites outnumber Latinos (e.g., Detroit and Cleveland) 
and it will be low in a city where Latinos outnumber Whites (e.g., El Paso or San 
Antonio). This issue is not narrowly academic; it can have important practical con- 
sequences for index scores and substantive conclusions. This takes on increasingly 
relevance in recent decades as the growth of the Latino and Asian populations has 
resulted in more complex racial demography in many cities. 

Finally, I close this discussion by stressing that S is not unusual among measures 
of uneven distribution in having linkages and interpretations relating to exposure 
and contact. To the contrary, one of the valuable insights gained from the difference 
of means formulations of indices of uneven distribution set forth in this monograph 
is that all popular indices of uneven distribution have direct and definite linkages to 
pairwise contact and exposure. Thus, for the example of White-Black segregation, 
both S and D can be obtained as simple group differences on exposure to Whites. In 
the case of S exposure is assessed directly by area proportion White (p). The only 
difference for D is that exposure is rescaled to either 0 or 1 depending on whether p 
equals or exceeds P. Thus, the key differences between indices of uneven distribu- 
tion are found not in whether the indices register contact and exposure — all popular 
indices do this. The key differences are found in how the specific indices scale con- 
tact and exposure differently based on the way segregation-relevant residential out- 
comes (y) are scored from area group proportion (p). 
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For the Theil index (H), the relevant function y =f (p) for scoring of residential 
outcomes (y) is a continuous function of p in the manner of S, but the form of the 
function is more complex. Specifically, the function y = f (p) is the following con- 
tinuous, ever-rising, nonlinear expression 


y=Q+[(E-e,)/E]/(p,/P-4,/Q) 


where e; is the entropy score for area i and E is the entropy score for the city as a 
whole. These are given by the calculations e, =p, -In(1/p,)+4q; -In(1/q;) and 
E=P. In(1 / P) +Q. In(1 / Q) . I owe special thanks to Warner Henson, III for help- 
ing me identify the form of this function.'> I review the formal basis for this scoring 
of residential outcomes in Appendices. 

Theil and Finizza (1971) and Theil (1972) argue that information theory pro- 
vides an attractive conceptual grounding for using entropy calculations to assess 
segregation. But most researchers who use H adopt it on a more narrow and practi- 
cal basis. In particular, H is often used because it is mathematically tractable in 
ways that facilitate decomposition analysis.'® The substantive relevance of area (e;) 
and city-level (E) entropy scores are seen narrowly as quantifying two-group racial 
diversity with the expression (E- e;)/E thus registering uneven distribution as depar- 
ture of area racial diversity from that which would occur under even distribution 
given the racial mix of the city population. 

I show below that the nonlinear relationship between y and p is visually simpler 
and more intuitively appealing than the mathematical expression introduced above 
might suggest. In its essence, the function maps p into y based on an ever-rising, 
backwards “S-curve”. The undulations of the S-curve and its symmetry, or lack 
thereof, vary with the relative sizes of the two groups in the comparison.'’ When the 
groups are identical in size, the undulations in the S-curve are moderate and the 
resulting curve is symmetrical. In this situation results for H and S tend to track each 
other very closely. When the two groups are unequal in size, the undulations in the 
S-curve for y—p relationship for H are asymmetrical and larger in amplitude and the 


'S At the time, Mr. Henson was an undergraduate research assistant at Texas A&M University. At 
the time of this writing, he is a sociology doctoral student at Stanford University. 


'6Reardon and Firebaugh (2002) emphasize this property in arguing that H is attractive for inves- 
tigating multi-group segregation. 

1 The graph for y =f (p) for G also tends to form a forward-leaning “S-curve. However, it is not 
A smooth curve; it is a series of small step functions that typically take the general form an S-curve. 
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resulting curve departs from linearity in greater degree. In this situation results for 
H and S may differ. 

Under this system for scoring segregation-relevant residential outcomes (y), H 
can be obtained from ( Y, — Y, ) and thus fits in the framework for measuring segre- 
gation set forth in this paper. As in the previous examples considered, the difference 
of means formulation of H shows that it can be interpreted in terms of scaled contact 
and exposure. In the case of White-Black segregation, H is the White-Black differ- 
ence in average contact with Whites (p) scored on the basis of the nonlinear function 
described above. When P and Q are balanced (i.e., P= Q = 50 ), the function is a 
symmetrical backwards “S”. As a result, the measure responds less to differences in 
p in the middle of its range (i.e., 25-75) and more to differences in the lower and 
higher ranges of p. When P and Q are imbalanced, one must study the y—p relation- 
ship to understand specifically how H responds differentially to contact over differ- 
ent ranges of p. I discuss this in more detail below. 


4.7 Scoring Residential Outcomes (y) for the Hutchens 
Square Root Index (R) 


At this point, only one measure of residential segregation that receives regular atten- 
tion in methodological studies of indices of uneven distribution has yet to be consid- 
ered. This is Atkinson’s index (A). While rarely used in empirical studies, it 
nevertheless has been discussed in several methodological studies of segregation 
indices. For example, James and Taeuber (1985) praise A for involving a user- 
specified parameter (5) which they argue can be used to “tune” the index to be sensi- 
tive to particular regions of the segregation curve. Massey and Denton (1988) also 
comment that this is a potentially interesting quality of A. In contrast, White (1986) 
and Hutchens (2001, 2004) view this characteristic of A as undesirable. Indeed, they 
characterize it as a fundamental flaw. They point out that A is “asymmetric” when 6 
is set to any value other than 0.5 and argue that the property of asymmetry intro- 
duces conceptual complications most would view as impractical if not fatal alto- 
gether for general use of A in segregation research. For example, the property of 
asymmetry implies that White-Black segregation can be logically and quantitatively 
different from Black-White segregation. No one has endorsed this as a desirable 
quality of segregation indices. I follow White and Hutchens in endorsing the prin- 
ciple of symmetry for segregation indices and therefore limit my consideration of A 
to only its symmetric implementation A.s) — the special case where 6 is set to 0.5. 
Hereafter, my references to A are to this version so I drop the subscript. 

I have not been able to discover a way to express the value of A as a simple dif- 
ference of means on scores of y based on area group proportion scores (p). However, 
I have found a difference of means solution for an index that is a close conceptual 
and mathematical surrogate for A. The index I refer to is the Hutchens (2001, 2004) 
square root index (R). This index has gained currency in the study of occupational 
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sex segregation, but has not yet gained wide usage in studies of residential segrega- 
tion. Since it may be unfamiliar to some readers, I briefly note three equivalent 
formulas for R. 


R =100-[1-2(t, /T) (p,/P)-(a,/Q) | 


R=100-[1-2(t, /T) p,a, /PQ | 


The similarity to Atkinson’s A can be seen by comparing the last formula with the 
following expression for A which obtains when the tuning parameter 6 is set to 0.5. 


Aros) = 100-| 1-{2(¢ IT): Jpa,} /PQ | 


The close relationship of R and A also can be seen in the fact that the two map 
onto each other based on the following exact nonlinear relationships 


A=2.R-R? and 
R=1-vV1- 4. 


Values of R are numerically lower than values of A. But since the relationship of 
their scores is exact and continuous, the two indices yield identical rank-orderings 
of segregation comparisons. Hutchens (2001, 2004) argues that R is an attractive 
measure of segregation in its own right. I include R in the discussion here on that 
basis as well as because it is a close surrogate for A. Additionally, values of R have 
a very strong relationship with values of D in empirical studies and R fares much 
better than D in technical reviews. !8 

For Hutchens’ square root index (R), the relevant scoring of residential outcomes 
(y) is a continuous, ever-rising, nonlinear function of p. Specifically, the function 


y=f(p) is 
y, =Q+(1- p.a; /PQ )/(p, /P-q; /Q) 


where p;, qi, P, and Q are as introduced earlier. Under this system for scoring resi- 
dential outcomes (y), R can be obtained from (Y,—Y,) and thus fits in the 


18 In the data sets I examine for this study, the square root of R consistently correlates with D at 0.99 
or higher. 
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framework for measuring segregation set forth in this paper. I establish the formal 
basis for this scoring of residential outcomes for R in Appendices. 

As with the other indices, this supports an interpretation of R in terms of scaled 
contact and exposure. In the case of White-Black segregation, R is the White-Black 
difference in average contact with Whites (p) scored on the basis of the nonlinear 
function shown above. Like H, the function produces a continuous, ever-rising, non- 
linear curve that forms a backwards “S”. Also like H, the undulations in the nonlin- 
ear curve vary with the relative sizes of the groups in the comparison. When the 
groups are identical in size, the undulations in the S-curve are modest and sym- 
metrical and the resulting curve is relatively close to linear. When the two groups 
are unequal in size, the undulations in the S-curve are asymmetrical and larger in 
amplitude and the resulting curve departs from linearity in greater degree. One must 
study the particular y—p relationship in each case to understand how R registers p 
over different ranges of p. 

Hutchens (2001, 2004) argues R is an attractive index in part because it orders 
aggregate segregation scores in a manner consistent with the principle of segrega- 
tion curve dominance advocated by James and Taeuber (1985). The Atkinson index 
(A), the Gini index (G), and the dissimilarity index (D) all also satisfy this principle. 
Accordingly, scores for R tend to correlate closely with scores for G and D and 
especially with scores for A.'? As I noted earlier, however, the principle of segrega- 
tion curve dominance is controversial and only a few methodological reviews 
endorse it. One reason for this mentioned earlier is that defining segregation in 
relation to the segregation curve eliminates two popular indices— the Theil entropy 
index (H) and the Separation Index (S) — that both have attractive features to recom- 
mend them and that both fare well in technical reviews. 

In essence, the principle of segregation curve dominance requires that indices 
place segregation comparisons involving non-crossing segregation curves in the 
same order as would result from segregation curve analysis.”'! Some methodological 
reviews explicitly reject the principle. Most reviews are less direct but, while not 
explicitly taking a position on the issue, they implicitly reject the principle by giving 
favorable evaluations of H and S which do not have the property.” I view the prin- 
ciple of segregation curve dominance as undesirable because it assigns priority to 
segregation indices that are necessarily insensitive to group residential separation 


1 Analyses of White-Minority segregation for core-based statistical areas reported later in this 
monograph document close, mildly nonlinear relationships among G, A.s) and R. 

2 Most controversially, it assigns logical primacy to the segregation curve — a graphical and geo- 
metric representation of group differences in cumulative rank distribution on area group propor- 
tions — without a compelling conceptual-theoretical basis for doing so. 

2! The principle does not specify how an index should rank segregation comparisons when segrega- 
tion curves cross. 

2 Coleman et al. (1982) explicitly reject the principle. White (1986) also questions it value. 
Reardon and Firebaugh (2002), Zoloth (1976), Stearns and Logan (1986), and others, ignore the 
principle but praise measures such as Theil’s entropy index (H) and the separation index (S) giving 
no concern to the fact that these indices do not conform to the principle of segregation curve 
dominance. 
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and neighborhood polarization. I emphasize the word “necessarily” because segre- 
gation curves register rank order differences between groups on area group 
proportion (p) without regard to whether group differences on p are large or small 
in magnitude. As a result, segregation curves can signal high levels of uneven dis- 
tribution when group residential separation and neighborhood polarization are low. 
I view this with great concern and accordingly review the issue in more detail in the 
next chapter. 

For now I conclude this discussion by arguing that it is important for researchers to 
at least have the option of focusing on uneven distribution that involves group separa- 
tion and neighborhood polarization. In my view segregation that separates groups into 
residing apart from each other in different neighborhoods that differ fundamentally on 
racial composition is substantively compelling. Separation conceived in this way is a 
logical prerequisite for group disparities on neighborhood residential outcomes such 
as quality of schools, exposure to crime and social problems, availability and quality 
of services, etc. In contrast, uneven distribution that does not involve group separation 
and polarization does not necessarily create the logical potential for group differences 
on these kinds of stratification-related neighborhood outcomes. 

To summarize, in this chapter I reviewed how five indices of uneven distribution — 
G, D, R, H, and S — all can be specified as differences of group means on residential 
outcomes (y) scored from area group proportion (p). These five indices represent the 
most popular, widely used, and carefully studied indices of uneven distribution in the 
literature on segregation measurement. Consequently, I conclude that all popular 
indices of uneven distribution have ready interpretations as measures of group differ- 
ences in contact and exposure. All of the indices indicate that groups experience the 
maximum possible average difference on contact outcomes when uneven distribu- 
tion is complete and groups live completely apart. Similarly, all of the indices indi- 
cate that groups experience identical contact outcomes under conditions of even 
distribution. From this vantage point, the substantive differences between the indices 
ultimately trace to one thing; the differences among them in how they register group 
differences on individual residential contact outcomes (y) in the intermediate ranges. 
The scaling function y =f (p) for each index provides insight into this. Accordingly, 
I review how this function varies across indices in the next chapter. 
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Chapter 5 
Index Differences in Registering Area 
Group Proportions 


My goal in this chapter is to help interested readers become more familiar with the 
residential outcomes for individuals and households that additively determine the 
scores of different indices of uneven distribution. To do so, I review the residential 
outcome scores that underlie segregation comparisons in the difference of means 
formulation looking in detailed at the segregation comparisons of Whites with 
Blacks, Latinos, and Asians in Houston, Texas in 2000. The data for these compari- 
sons are taken from block group tabulations for families obtained from Summary 
File 3 of the 2000 census.' Table 5.1 presents the basic demographic information for 
the four groups and the three segregation comparisons considered here. The results 
for “overall” percentages document that Whites (non-Hispanic) are the largest 
group at 52.7 % overall, followed by Latinos (34.8 %), Blacks (16.5 %), and Asians 
(4.8 %). The results also document that the pairwise percentages for any group com- 
parison are always higher than overall percentages for the obvious reason that 
groups outside the comparison are excluded from the denominator in the 
calculations. 

Table 5.2 lists the values of G, D, R, H, and S obtained using standard computing 
formulas given in James and Taeuber (1985) for D, G, S and H, and a comparable 
formula for R adapted from Hutchens (2001) (reviewed in Appendices). 


D=100-Et, |p, —P|/2TPQ 


G=100-25t,t, 


p: —p;|/2T’PQ 


S=100-Et,(p,-P) /TPQ 


' Specifically, I draw on Table 160 (A-I) which tabulates families by race, poverty status, family 
structure, and presence of related children under age 18 for block groups. In this tabulation White 
is Non-Hispanic White, Black and Asian counts include Hispanics, and Latinos are of any race. 
The information in the tabulations pertaining to social characteristics of poverty and family status 
are not used here, but are used in analyses presented in Chap. 9. 
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Table 5.1 Group counts and overall and pairwise group percentages for Houston, Texas, 2000 


Percentage White-Black White-Latino White-Asian 


among all pairwise pairwise pairwise 
Group N of families families percentage percentage percentage 
White 627,613 52.7 76.2 68.0 91.8 
Black 195,928 16.4 23.8 - - 
Latino 294,931 24.8 - 32.0 - 
Asian 55,746 4.7 - - 8.2 
Total 1,191,102 100.0 100.0 = 100.0 100.0 


Source: US Census 2000, Summary File 3 


Table 5.2 Scores for White-Minority segregation indices obtained using standard computing 
formulas, Houston Texas, 2000 


Group comparison G D R H S 
Computed using standard formulas 

White-Black segregation 87.07 70.97 47.02 53.59 57.39 

White-Latino segregation 74.19 58.37 28.11 35.46 40.96 

White-Asian segregation 76.28 58.22 34.96 31.31 23.88 


Source: US Census 2000, Summary File 3 


H=100-St,(E-e,)/ET 
R=100-(1-t,-\[p,4, /PO /T) 


Terms are defined as noted earlier (and also summarized in Appendices). In this 
particular analysis, the five segregation indices — G, D, R, H, and S — yield generally 
similar overall patterns of aggregate segregation between Whites and the three non- 
White groups. For example, all five indices show that substantial segregation is 
evident in each comparison. Similarly, all five indices show that White-Black segre- 
gation is the highest of the three segregation comparisons examined. There is one 
notable finding regarding how the different measures portray patterns of aggregate 
segregation. D, G, and R indicate that White-Latino segregation and White-Asian 
segregation are roughly similar. H and S indicate that White-Asian segregation is 
substantially lower than White-Latino segregation. 


5.1 Segregation as Group Differences in Individual 
Residential Attainments 


I next present results that demonstrate how the scores of the aggregate segregation 
indices can be obtained from simple differences of group means on residential 
attainments. Table 5.3 lists the values of D, G, S, H, and R calculated using the dif- 
ference of means formulations introduced in this monograph. The three panels in 
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Table 5.3 Details for obtaining scores for White-Minority segregation from difference of group 
means on residential outcomes, Houston, Texas, 2000 


Residential outcome scored from Mean for Mean for White-Minority 

index-specific scaling function y=f(p) Whites Minority difference 
White-Black segregation 

y scored for G/2 (x100)* 60.36 16.82 G=87.08 
y scored for G (x200) 120.72 33.65 G=87.07 
y scored for D/2 (x100)* 58.44 22.96 D=70.96 
y scored for D (x100) 87:73 16.75 D=70.98 
y scored for R (x100) 72.98 25.96 R=47.02 
y scored for H (x100) 82.57 28.98 H=53.59 

_y scored for S (x100) 89.86 32.48 $=57.38 

White-Latino segregation 
y scored for G/2 (x100)* 61.86 24.76 G=74.20 
y scored for G (x200) 123.72 49.53 G=74.19 
y scored for D/2 (x100)* 59.33 30.14 D=58.38 
y scored for D (x100) 81.49 23.12 D=58.37 
y scored for R (x100) 63.37 35.26 R=28.11 
y scored for H (x100) 72.95 37.50 H=35.45 
y scored for S (x100) 81.12 40.17 S=40.95 
White-Asian segregation 

y scored for G/2 (x100)* 53.11 14.97 G=76.28 
y scored for G (x200) 106.22 29.94 G=76.28 
y scored for D/2 (x100)* 52.38 23.27 D=58.22 
y scored for D (x100) 75.15 16.93 D=58.22 
y scored for R (x 100) 70.70 35.74 R=34.96 
y scored for H (x100) 83.47 52.16 H=31.31 
y scored for S (x100) 93.79 69.91 S=23.88 


Source: US Census 2000, Summary File 3 
“For these scorings of y for G and D, the values of G and D are given by 2:(Y,— Y2) 


the table report results separately for the White-Black, White-Latino and White- 
Asian segregation comparisons, respectively. The first step in generating these 
results is to calculate the residential outcomes scores (y) at the block group level. I 
obtain these by applying the relevant index-specific scaling function y =f ( p) to the 
value of pairwise proportion White (p) at the block group level. The second step is 
to calculate the group-specific means for scaled contact with Whites (y). The result- 
ing values are reported in Table 5.3. The last step is to calculate the difference of the 
group-specific means which also are reported in Table 5.3. 

The results are straightforward. The values of the differences of means equal the 
values of the index scores reported in Table 5.2. Any apparent differences reflect 
only rounding error and would disappear if the results were reported to greater pre- 
cision. Of course, the index scores reported in Table 5.3 are redundant with the 
results already presented and do not themselves provide any new insights into 
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segregation patterns. But presenting the detailed results documents that the 
difference-of-means formulas yield the same results as the conventional formulas. 

I noted in the previous chapter that the segregation-relevant residential outcomes 
(y) that determine the group means (Y, and Y>) are index-specific scores for scaled 
pairwise contact with Whites. The exact scoring of residential outcomes (y) varies 
from index to index, but values of y always are a positive, monotonic function of 
pairwise proportion White (p) for the household’s or individual’s area of residence. 
The minority group’s average pairwise contact with Whites cannot exceed that 
observed for Whites and it can reach parity only under the condition of exact even 
distribution. When there is departure from uneven distribution, mean contact with 
Whites for Whites will diverge from mean contact with Whites for the minority 
group. The average magnitude of the difference will be reflected in the difference of 
means (Y, -Y,) which will yield the index score. Given this, it is instructive to 
consider how the different index-specific residential outcome scores compare to 
each other. 

Figure 5.1 plots the values of residential attainment scores (y) by values of pair- 
wise proportion White (p) for G, D, R, H, and S for the three White-Minority segre- 


White-Black 


0 20 40 60 80 100 
Area Group Proportion p=n1/(n1+n2) (x100) 


White-Latino White-Asian 


0 20 40 60 80 100 
Area Group Proportion p=n1/(n1+n2) (x100) Area Group Proportion p=n1/(n1+n2) (x100) 


Fig. 5.1 Scoring residential outcomes (y) from pairwise proportion White (p) to compute G, D, R, 
H, and S as a difference of means. Legend for index-specific curves: y scored for G/2 — gray, long 
dashes; y scored for D/2 — gray, short dashes; y scored for R — gray, solid line; y scored for H — 
dark, long dashes; y scored for S — dark, solid line 
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gations comparison presented in Table 5.3. These plots provide a basis for gaining 
insight into how each segregation index registers residential contact outcomes (y) 
based on pairwise area racial mix (p). I begin with the scores for the separation 
index (S) because they are the easiest to describe. The scaling function y = f (p) for 
S maps y directly to the values of p producing a diagonal line rising from (0,0) to 
(100,100) in all three graphs. As a result, it is very easy to interpret the relationship 
between y and contact with Whites (p); a one-point change in contact with Whites 
translates into a one-point change in y. Thus, the graph for the White-Black com- 
parison indicates that a Black family that moves from a 20% White area to a 70% 
White area would experience an increase of 50 points on scaled contact with Whites. 
The graph for the White-Latino comparison shows that the same would be true for 
a Latino family moving from a 20 % White area to a 70 % White area and the graph 
for the White-Asian comparison shows that the same would be true for an Asian 
family moving from a 20% White area to a 70% White area. This similarity of 
change in y by change in p is not observed for the other indices because their scaling 
functions are nonlinear. 

The scaling function y = f(p) for the Theil index (H) converts values of p to 
values of y that fall on a smooth, ever-rising, backwards “S-curve”’. In these graphs 
the departure from nonlinearity is not dramatic, especially in comparison to what 
will be seen for some other indices. Accordingly, the values of residential attain- 
ment scores (y) relevant for H tend to be relatively close to residential attainment 
scores (y) relevant for S. This provides a new insight to why scores for the separa- 
tion index (S) tend to correlate more closely with the scores of the Theil Index (H) 
than with the scores of other indices. Looking across the three segregation compari- 
sons one can see that the nonlinearity is most pronounced in the White-Asian com- 
parison and least pronounced in the White-Latino comparison. This is because 
nonlinearity in the y-p relationship for residential attainment scores (y) relevant for 
H will be less pronounced when the two groups in the comparison are more equal in 
size and more pronounced when one group is substantially larger than the other. As 
a result, residential outcomes scores (y) for H and S tend to track each other more 
closely when the two groups in the comparison are comparable in size and less 
closely when the groups are unequal in size. 

The nonlinearity in the y-p relationship for H just described has another implica- 
tion. It means that a change of a fixed amount in contact with Whites (p) will trans- 
late in different amounts of change in y for H depending on two factors; the initial 
starting value of p and relative size of the two groups. Thus, inspection of the three 
graphs in Fig. 5.1 indicates that a family that moves from an area that is 20 % White 
area to an area that is 70% White area would experience an increase of 35.9 points 
on scaled contact with Whites (y) in the White-Black comparison, 36.9 points in the 
White-Latino comparison, and 31.8 points in the White-Asian comparison. The 
change in scaled contact for the White-Asian comparison is smallest because the 
White-Asian group size comparison is the most imbalanced. This leads to greater 
nonlinearity in the y-p relationship and smaller changes in y when moving from 20 
to 70 on p. In contrast, the White-Latino group size comparison is the most balanced 
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of the three and leads to milder nonlinearity in the y-p relationship and larger 
changes in y as p moves from 20 to 70. 

In each group comparison the changes in y as p moves from 20 to 70 are smaller 
than the 50 point increase in y observed for S for the same group comparisons. This 
is because the y-p relationship is linear for S and nonlinear for H. The nonlinearity 
in the y-p relationship for H creates a large region in the middle portion of the range 
of p where the slope of the curve is less than 1.0 and thus changes in y are smaller 
than changes in p.” In addition, the degree to which changes in y are smaller than 
changes in p varies across the three segregation comparisons because the nonlinear- 
ity in the y-p relationship varies; specifically, the departure from linearity is more 
pronounced when the two groups in the comparison are more unequal in size and 
thus changes in y over the middle range of p are smaller in these group 
comparisons. 

The function y =f (p) for the Hutchens index (R) also generates values of y that 
fall on a smooth, ever-rising, backwards “S-curve”. The curve is similar in form to 
the curve seen for the Theil index (H). But the nonlinearity in the curve for R is 
noticeably more pronounced. Accordingly, the patterns for the scoring of y for R are 
similar to those just noted for H, but “amplified”. For example, as with H, changes 
of a fixed amount in contact with Whites (p) translate into different impacts on y 
depending on the initial starting value of p and relative size of the two groups. Thus, 
the graphs in Fig. 5.1 indicate that a family that moves from an area that is 20% 
White area to an area that is 70 % White area would experience an increase of 24.2 
points on scaled contact with Whites (y) in the White-Black comparison, 25.7 points 
in the White-Latino comparison and 18.3 points in the White-Asian comparison. 
The changes in y are even smaller than the changes in y noted for H because the 
departure from linearity in the y-p relationship for R is greater. This “flattens” the 
y-p curve over the middle range of p even more and causes changes in y to be 
smaller than changes in p. As seen with H, the changes in y vary across the different 
segregation comparisons; they are larger when groups are more equal in size and 
smaller when groups are more unequal in size. 

The function y =f (p) for the gini index (G/2) also produces an ever-rising, 
backwards “S-curve”. However, in contrast to the functions for H and R, this curve 
is irregular rather than smooth. This is because G tracks percentile scores for p and 
these depend not on the specific value of contact with Whites (p) itself, but instead 
on how values of p translate into rank position on contact with Whites. In the case 
of White-Black segregation, for example, this is determined by the number of 
Whites and Blacks living in areas where p higher and the number of Whites and 
Blacks living in areas where p is lower. The nonlinearity of the function for G/2 is 
more pronounced than that seen for the functions for H and R and this produces 
larger departures from the diagonal line for S. As a result, it is reasonable to say that 
scoring y as the percentile transformation of p is as the most “dramatic” rescaling of 
contact of those considered here. Thus, the graphs in Fig. 5.1 indicate that a family 


?For example, from inspection of the figure for the White-Black comparison changes in y are 
smaller than changes in p for the portion of the curve between approximately 15 and 85 on p. 
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that moves from an area that is 20% White area to an area that is 70% White area 
would experience an increase of 13.2 points on scaled contact with Whites (y) in the 
White-Black comparison, 27.7 points in the White-Latino comparison, and 6.4 
points in the White-Asian comparison. In each case, the changes in y are even 
smaller than the changes in y seen for H and R because the pronounced nonlinearity 
in the y-p relationship for G “flattens” the y-p curve over the middle range of p quite 
dramatically causing changes in y to be much smaller than changes in p. As observed 
previously for H and R, the changes in y vary across the different segregation com- 
parisons with changes being larger when groups are more similar in size and smaller 
when are more unequal in size. Thus, the change in y for the White-Latino compari- 
son, where the two groups are more similar in size, is more than four times larger 
than the change in y for the White-Asian comparison where the two groups are more 
unequal in size. 

In contrast to S, H, R, and G, the scoring of y for the index of dissimilarity (D) is 
not ever-rising as p increases. Instead, it follows a simple, two-value, monotonic 
step function. The scoring of y for D/2 shown in the graphs draws on the formula- 
tion of D as a version of G computed from a two-category ranking scheme with 
areas where p = P being in the higher ranking category and all other areas being in 
the lower ranked category. For example, in the White-Black comparison, y is scored 
14.6 when p< P and 64.6 when p= P .? The scoring of y for D could alternatively 
be shown as a step function where values of y are either at O or 100 depending on 
whether p is above P or not. But I present the D/2 formulation here to facilitate the 
comparison of D with G. 

The step function for D/2 produces a rescaling of contact that responds to changes 
in p only when p crosses from being below P to equaling or exceed it. As the graph 
in Fig. 5.1 indicates, this does not occur when a family moves from an area that is 
20 % White area to an area that is 70 % White area in the White-Black comparison. 
So a family making this move would experience no change in scaled contact with 
Whites (y); y is 14.6 when p is 20 and y remains at this value when p is 70. The same 
is true in the White-Asian comparison. In contrast, the change in y for a family mak- 
ing a comparable move in the White-Latino comparison would be 50.0 points (the 
maximum possible change under the D/2 formulation). 

These results highlight two things about D. They highlight that D responds to 
changes in p only when p crosses a specific value and otherwise D is insensitive to 
changes in p. The examples also highlight that the value of p that D responds to dif- 
fers from one segregation comparison to another based on group size. Thus, when 
groups are equal in size, D responds to changes in p at 50% White and when the 
minority group is smaller in size, D responds to changes in p at increasingly higher 
levels. Thus, the 50 point change in y occurs when p crosses from below to above 


`The value of 14.6 indicates that the areas in the lower ranking category contain 29.2 % of the 
households in the analysis and thus have an average percentile score of 14.6. The value of 64.6 is 
based on the average percentile score for the 70.8 % of households that are in the higher ranking 
category; that is, 29.2 + 70.8 /2. 
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68.0 in the White-Latino comparison, from below to above 76.2 in the White-Black 
comparison, and from below to above 91.8 in the White-Asian comparison. 


5.2 Implications for Sensitivity to Separation 
and Polarization 


The patterns just reviewed provide an intuitive basis for comparing indices of 
uneven distribution and placing them on a continuum. One end of the continuum is 
anchored by the separation index (S). The y-p relationship for S is linear. So it reg- 
isters group differences in pairwise contact (p) in its original metric. This is well- 
suited for measuring group separation and neighborhood polarization. If the group 
means on p differ by a large amount, it follows that groups live apart from each 
other with members of each group living in neighborhoods where their group pre- 
dominates. If the group means on p are similar, it follows that the groups live 
together, not apart, and thus share similar neighborhood outcomes on pairwise 
racial mix (p). 

The other end of the continuum is anchored by the gini index (G). The y-p rela- 
tionship for G is profoundly nonlinear. This is because it does not register group 
differences in pairwise contact (p) in its original metric. Instead, the scoring func- 
tion instead converts the level of actual contact into a score for rank order position 
via the percentile transformation. This is well-suited for measuring ordinal differ- 
ences in group contact. But it is ill-suited for measuring group separation and neigh- 
borhood polarization. Accordingly, if group means on percentile scores (y) based on 
pairwise group contact (p) differ by a large amount in White-Minority comparisons, 
one can safely conclude that Whites consistently live in neighborhoods that rank 
higher on proportion White than do minorities. But, one cannot conclude that the 
minority group lives apart from Whites in neighborhoods where the minority group 
predominates. This is because percentile scores logically cannot provide reliable 
signals about underlying quantitative differences. As a result, percentile scoring of 
pairwise group contact cannot provide a reliable basis for assessing group residen- 
tial separation and neighborhood polarization. 

This is not an esoteric point. I will present empirical analyses in the next chapter 
that demonstrate that high scores on G can and do occur when group residential 
separation and neighborhood polarization is low, and in some cases even trivial. 
Ultimately, researchers should decide for themselves if they view this quality of G 
as desirable, undesirable, or irrelevant. But to decide, they first must become aware 
that G has this quality. In the main they are not aware and this is understandable 
because the issue receives little attention in methodological discussions in the litera- 
ture. As a consequence, no one has set forth a well-articulated rationale for prioritiz- 
ing group differences in rank order position on contact over the group differences in 
quantitative “raw score” standing on contact. 

The remaining three indices of uneven distribution considered here — the index 
of dissimilarity (D), the Hutchens square root index (R), and the Theil entropy index 
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(H) — fall in intermediate positions on the continuum between the gini index (G) and 
the separation index (S). Not surprisingly, D is closest to G. R and H fall in between 
with R closer to D and H close to S. The basis for this ordering is suggested by the 
y-p relationships for the indices depicted in the graphs in Fig. 5.1. G is at the oppo- 
site end of the continuum from S because its y-p relationship is most profoundly 
nonlinear — resulting due to the fact that the percentile scoring of y from p often 
produces scores for y that depart dramatically from the original value of p. The dis- 
similarity index (D) is closest to the gini index (G) because D can be understood as 
a crude version of G based on a two-category ranking scheme. This is indicated 
visually by the fact that the step-function “curve” for the y—p relationship for D 
overlays the “finer-grained” steps in the y-p curve for G seen in the figures. 

The Hutchens square root index (R) falls near the index of dissimilarity (D) 
based on the fact that the y-p curve for R is closer to linear than the y-p curve for G 
but is more nonlinear than the y-p curve for the Theil entropy index (H). Perhaps 
this should not be surprising since Hutchens (2001) notes that R has the quality of 
ranking segregation comparisons in accord with the principle of segregation curve 
dominance. Since the segregation curve is a graphical depiction of rank order differ- 
ences, it makes sense that R is more sensitive to group differences in rank order 
standing on group contact than to group differences in quantitative standing on 
contact. 

The y-p relationship for the Theil entropy index (H) displays only mild departure 
from linearity and thus produces curves that align more closely with the linear y-p 
curve for the separation index (S). On this basis, one can infer that H is more sensi- 
tive to group residential separation and neighborhood polarization than every other 
index except S. 

Each of the indices of uneven distribution considered here — G, D, R, H, and S — 
have been endorsed in methodological studies.* And each has been adopted by 
researchers who have seen the index as having qualities that are attractive for the 
purposes of the studies they were undertaking. The discussion here provides one 
additional basis for choosing among indices — sensitivity to group differences in 
rank order standing on group contact or sensitivity to group differences in contact 
measured in its “natural” metric. This can also be cast in terms of sensitivity to 
group residential separation and neighborhood polarization because this follows 
differences in actual contact, not differences in rank order position on contact. 

If one is interested in identifying “prototypical segregation” as seen in traditional 
exemplars such as White-Black segregation in Chicago and White-Latino segrega- 
tion in Los Angeles, the separation index (S) is a logical choice and the Theil 
entropy index (H) would be the next best choice. The basis for choosing S is this. 


“Of these, D is viewed as most problematic on technical grounds, but it usually receives a “condi- 
tional pass” because its technical deficiencies often do not have important practical consequences. 
For example, in empirical studies it typically correlates very closely with G, its close and techni- 
cally superior cousin. 
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High values on S always signal a high level of group residential separation and neighbor- 
hood polarization of the kind featured in didactic discussions of examples of pronounced 
segregation. 


This is not strictly the case for high values on H. But the relatively close relationship 
of y scored for H with y scored for S (i.e., with p), dictates that high scores on H are 
very likely, albeit not necessarily guaranteed, to involve a high level of group sepa- 
ration and neighborhood polarization. In contrast, the other three indices — R, D, and 
G — are not reliable in signaling the presence of prototypical segregation that 
involves group separation and neighborhood polarization. 

If one is interested in identifying segregation assessed strictly on rank-order 
standing on group contact (p) as registered by the segregation curve, S and H are not 
good choices. The gini index (G) and the Hutchens square root index (R) would be 
the superior choices on technical grounds and the dissimilarity index (D) would be 
an attractive choice based on past usage, ease of computation and interpretation, and 
related practical considerations. 

A few simple questions can help frame the issues researchers confront when they 
choose to give priority to one index over others. One is “Do the theories and sub- 
stantive concerns motivating analysis of segregation lead one to naturally focus on 
prototypical segregation which involves substantial area racial polarization and 
clear group differences in quantitative levels of contact or do they lead one to instead 
focus on group differences in rank order standing on contact?” If the substantive 
focus is on rank order standing, one should be able to explain why high scores of 
76.3 and 58.2 on G and D, respectively, for the White-Asian comparison are socio- 
logically important in light of the low score of 23.9 on S. The low score on S, as well 
as the component group means on contact with Whites that determine it, document 
that White-Asian segregation in Houston is not “prototypical” segregation. White- 
Asian segregation does not involve substantial group separation and neighborhood 
polarization; Asians are more than twice as likely to live with Whites (mean pair- 
wise contact is 69.9 %) as with Asians (mean pairwise contact is 31.1%). 

In contrast, G and D for White-Latino segregation — at 74.2 and 58.4, respec- 
tively — take values comparable to those observed for White-Asian segregation, but 
White-Latino segregation is more in keeping with prototypical segregation. In con- 
trast to Asians, Latinos are much less likely to live with Whites; Latino pairwise 
contact with Whites is only 40.2% while Latino pairwise contact with Latinos is 
59.8 %. As a result, the score of 40.1 on S for White-Latino segregation indicates 
that group separation and neighborhood polarization is nearly twice as high in the 
White-Latino comparison as in the White-Asian comparison. Similarly, G, D, and S 
are 87.1, 71.0, and 57.4, respectively, for the White-Black comparison. The values 
of G and D are only 10.8 and 13.4 points higher, respectively, than the values 
observed for the White-Asian comparison. But the value of S is some 33.5 points 
higher and is more than double the value of S for the White-Asian comparison. The 
component terms of S for the White-Black comparison indicate clearly that this is 
“prototypical” segregation involving substantial group separation and neighbor- 
hood racial polarization. Consistent with this, both Whites and Blacks live apart in 
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neighborhoods where their group predominates. White pairwise contact with Whites 
is 89.9 % and Black pairwise contact with Blacks is 68.5 %. The level of same group 
contact for Blacks is more than double the level of 31.1 % seen for Asians. In sum, 
G and D suggest that all three segregation comparisons are fairly similar. S suggests 
White-Asian segregation is distinctively different from White-Latino and especially 
White-Black segregation. 

Figure 5.1 clarifies why G and D yield high scores for White-Asian segregation 
when S does not. It is because G and D assign great importance to group differences 
on p that have minimal impact on S because they are quantitatively small. S takes a 
relatively low value of 23.8 because Asian pairwise contact with Whites, while not 
reaching the level of 93.8 % seen for Whites, is nevertheless quite high at 69.9 %. To 
calculate G, values of p are converted to percentile scores and the group difference 
is then doubled.° While the group means for p do not necessarily map exactly to the 
group means for percentile scores (because the percentile transformation is nonlin- 
ear), it instructive to note that the values of 93.8 and 69.9 for p translate to percentile 
score values of 36.3 and 6.8, respectively. Taking twice the difference to obtain the 
implications for G yields the value of 59.0. Thus, the initial modest difference on p 
that produces a value of 23.8 points for S translates to an implied difference of 59.0 
points for G. This is actually less than the observed value of G of 76.3 which means 
that the exaggeration of group differences on p is consistently larger than this par- 
ticular calculation suggests. 

Applying this same exercise to the group difference of medians also is “‘instruc- 
tive.” The group medians on p are 97.5 for Whites and 76.7 for Asians. This yields 
a group difference at the medians of 20.8 (which is close to the difference in group 
means of 23.8). These values of p translate to 53.9 and 9.7, respectively, when con- 
verted to percentile scores. When this difference is doubled to obtain the implica- 
tions for G specified as a difference of group medians, the result is 88.4. So the 
original quantitative difference in “typical” residential outcome of 20.8 when p is 
measured in its original metric grows to more than four times that size when p is 
rescaled by the percentile transformation curve shown in Fig. 5.1. 

A similar pattern is observed when values of p are converted from their original 
metric to the 0 or 100 scoring scheme used for D. The values of 69.9 and 76.7, 
which represent the mean and median, respectively, for Asians on p become 0.0. In 
contrast, the values of 93.8 and 97.5, which represent the mean and median, respec- 
tively, for Whites on p become 100.0. Thus, the original group differences at these 
points of comparison — 23.8 points at the group means and 20.8 points at the group 
medians, expand to the maximum possible difference of 100.0. 

The point to take away is simple, but important. The rescaling of p from its origi- 
nal metric, which determines S, to the scaled contact scores for y that determine G 
and D serves to exaggerate small quantitative differences on p. Accordingly, values 
of G and D are usually larger and are never smaller than values of S.° Furthermore, 


>This is because the maximum possible group difference on percentile scores is 50. 


° Additionally, since D is a crude version of G based on a three-point segregation curve instead of 
the full segregation curve, G is almost always higher and is never lower than D. 


56 5 Index Differences in Registering Area Group Proportions 


the degree to which the rescaling exaggerates quantitative differences on p is greater 
when groups are unequal in size as seen in the White-Asian comparison. Accordingly, 
the G-S and D-S discrepancies can be especially large in such comparisons. 

This raises the question, “Why is it appropriate to score y in a way that dramati- 
cally amplifies group differences in contact with Whites as observed in this exam- 
ple?” Relatedly, “In what way is the exaggerated difference of 59.1 points on y 
scored for G and 100.0 points for y scored for D more sociological meaningful than 
the smaller difference of 23.8 points for y scored for S?” Perhaps compelling 
answers to these questions can be given. For now, however, the measurement litera- 
ture does not provide a ready answer and I am skeptical that a compelling answer 
can be advanced. Regardless, it will remain the case that in these segregation com- 
parisons examining S and its component terms reveals important information that 
would be missed if one looked only at G and D. Specifically, S documents that 
White-Asian segregation does not involve group residential separation and neigh- 
borhood polarization whereas White-Latino segregation and especially White-Black 
segregation do. The practical implication is straightforward; one cannot safely 
assume that high values of G and D indicate a prototypical pattern of segregation. 
One must also examine S to draw a safe conclusion on this issue. 
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Chapter 6 
Empirical Relationships Among Indices 


In this chapter I present analyses that document various aspects of the empirical 
relationships among the segregation indices examined in this study. I document 
both situations where the indices consistently agree and also situations where they 
often disagree. I then offer observations on what may be learned from considering 
these two situations. In addition, I use portions of the chapter to review several prac- 
tical issues researchers may want to consider when using the indices in empirical 
studies. 

I start by reviewing results from a large, comprehensive data base of index scores 
for White-Minority segregation comparisons. More specifically, the data base con- 
tains segregation scores for White-Black, White-Latino, and White-Asian compari- 
sons for 960 core-based statistical areas (CBSAs). CBSAs are constructed from 
counties. I applied the 2010 definitions to data from 1990 to 2000 to obtain index 
scores using constant area boundaries at these three points in time. The full data set 
includes index scores computed using data for three different spatial units — census 
blocks, census block groups, and census tracts. I focus primarily on the scores com- 
puted using block-level data because block groups and census tracts are too large to 
use for assessing segregation in smaller CBSAs. 

Massey and Denton (1988:299) note that multiple options for areal units can be 
conceptually defensible. Citing prior research by Duncan and Duncan (1955b) and 
Taeuber and Taeuber (1965) as well as drawing on their own experiences, Massey 
and Denton also note that, while index scores consistently run higher when segrega- 
tion is calculated using smaller areal units, block-based and tract-based index scores 
tended to correlate closely in the studies they considered. This suggests that findings 
regarding patterns in cross-city variation in segregation and trends over time in seg- 
regation will tend to be similar whether using scores computed from tracts, block 
groups, or blocks. However, an important qualification must be noted on this point. 
It is that these findings are based on studies using a relatively small number (N~ 60) 
of large metropolitan areas and the findings do not hold in broader data sets. Thus, 
I obtain similar findings as reported in these earlier studies when I restrict the 
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analysis here to include only the largest metropolitan areas. However, I find that the 
choice of spatial unit is much more consequential when I use the full data set which 
includes hundreds of smaller metropolitan CBSAs and micropolitan CBSAs. 

The reason choice of spatial unit matters more in broader samples is simple; 
tracts are too large to reveal segregation patterns in smaller CBSAs. Indeed, the 
number of tracts in micropolitan CBSAs is often very small — sometimes falling to 
single digits. As a result, tracts are not viable units for assessing segregation in 
smaller communities; tracts consistently yield low scores when closer inspection of 
residential patterns reveals that segregation is clear and pronounced. In contrast, 
census blocks can reliably detect segregation patterns in all CBSAs regardless of 
size. The difference between index scores based on tracts and index scores based on 
blocks is consistently much larger in small- and medium-sized CBSAs. Accordingly, 
I use scores based on block data in analyses involving the full range of metropolitan 
and micropolitan CBSAs. When I use scores based on tract or block group data I 
restrict analysis to include only large metropolitan CBSAs. 

Index scores for my full CBSA analysis data set are based on block-level group 
population counts obtained from Summary File 1 in 2000 and 2010 and from the 
PL-94 (voter redistricting) File for 1990. The data for Whites, Blacks, and Asians do 
not include Latinos and the data for Latinos include persons of all races. The analy- 
ses reported here are based on 4,319 White-Minority comparisons for CBSAs where 
both groups in the segregation comparison have overall population counts of at least 
1,500. In all there are 1,718 White-Black comparisons, 1,754 White-Latino com- 
parisons, and 847 White-Asian comparisons. 

Table 6.1 provides descriptive statistics summarizing the distributions of index 
scores for G, D, R, H, and S obtained for each of the three White-Minority compari- 
sons. Several patterns stand out in the results. One is that scores for G and D consis- 
tently run higher than scores for R, H, and S. This is evident when comparing values 
at the mean and also at the five quantile values examined. A related pattern is that 
scores for R, H, and S are relatively similar at the median and above (i.e., at Pso, P75, 
and Poo), but scores for H and especially S are noticeably lower below the median 
(i.e., at P25 and especially at P;o). The analyses reported in the previous chapter pro- 
vide a basis for understanding both of these patterns. S typically generates smaller 
group differences on contact with Whites because S registers the original untrans- 
formed pairwise contact scores (p). In contrast, G, D, R, and H subject the original 
or “raw” contact scores (p) to a nonlinear rescaling that consistently serves to exag- 
gerate group differences in contact with Whites when the original “raw-score” con- 
tact differences are small (i.e., when average values of p are relatively high for both 
groups) and S is likely to take a low value. As noted in the previous chapter, the 
nonlinearity in the y-p scaling function is more dramatic for G and D. This causes 
their scores tend to consistently run somewhat higher than the other indices. One 
practical implication of these findings is that one should keep these inherent “scale” 
differences in index values in mind when making comparisons across different indi- 
ces. For example, as a rule of thumb, I suggest the three- and four-category schemes 
for characterizing levels of segregation in broad categories in Fig. 6.1. 
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Table 6.1 Descriptive statistics for indices of uneven distribution for White-Minority comparisons 


for CBSAs for 1990, 2000, and 2010 


N of 
cases Mean SD IQR IDR Pio P3; Ps Py; Poo 

Gini Index (G) 

White-Black 1,718 86.8 6.9 8.4 16.8 77.7 83.3 88.1 91.7 94.5 

White-Latino 1,754 76.1 9.0 122 23.7 63.2 70.6 77.3 82.7 87.0 

White-Asian 847 79.8 75 10.8 196 69.1 74.8 80.9 85.6 88.6 
Dissimilarity Index (D) 

White-Black 1,718 71.5 8.6 10.9 21.8 60.3 66.2 72.1 77.2 82.1 

White-Latino 1,754 59.3 9.3 12.8 246 466 53.1 59.7 65.8 71.3 

White-Asian 847 64.1 8.8 126 23.9 52.0 57.8 646 704 75.9 
Hutchens Square Root Index (R) 

White-Black 1,718 51.5 10.9 145 282 37.5 441 51.8 58.7 65.8 

White-Latino 1,754 36.6 10.6 154 28.0 22.6 286 365 44.1 50.5 

White-Asian 847 42.5 10.3 14.7 27.8 28.0 35.3 43.0 50.0 55.8 
Theil Entropy Index (H) 

White-Black 1,718 48.9 12.9 180 35.0 30.3 39.9 494 58.0 65.3 

White-Latino 1,754 32.6 9.1 12.7 229 21.1 26.0 32.2 38.7 44.0 

White-Asian 847 30.7 6.8 84 165 226 25.9 304 344 39.1 
Separation Index (S) 

White-Black 1,718 43.5 18.3 28.1 49.7 16.1 294 464 575 65.8 

White-Latino 1,754 25.6 123 19.2 33.2 9.6 154 248 346 42.8 

White-Asian 847 15.7 84 99 20.4 Pa 9.7 13.2 19.6 28.1 


Source: Index scores are calculated use block-level data from 


U.S. Census summary files. 
Comparisons are excluded if the minority group total population is under 1,500. SD is standard 
deviation, IQR is interquartile range, IDR is interdecile range, and Pjo-Poo are selected percentiles 


Level G D R 
Three Broad Categories 
High 80-100 65-100 50-100 
Medium 50-79 35-64 20-49 
Low 0-49 0-34 0-19 
Four Broad Categories 
Very High 85-100 70-100 60-100 
High 65-84 50-69 35-59 
Medium 50-64 30-49 15-34 
Low 0-64 0-29 0-14 


60-100 
35-59 
15-34 


0-14 


60-100 
35-59 
15-34 

0-14 


Fig. 6.1 Suggested schemas for placing index scores within 


segregation 


broad groupings 


for levels of 
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Regarding group comparisons, all five indices suggest that White-Black segrega- 
tion is consistently higher than both White-Latino segregation and White-Asian seg- 
regation. Index scores are higher for the White-Black comparison at the mean and 
at every quantile listed in the table. Interestingly, the absolute and relative differ- 
ences in how scores vary across group comparison are smallest for G, which has the 
highest scores on average, and they are largest for S, which generally takes much 
lower scores. The magnitude of the differences across group comparisons for D, R, 
and H fall in between the larger differences seen for S and the smaller differences 
seen for G. When comparing median values, the maximum difference across group 
comparisons is 10.8 points for G, 12.4 points for D, 15.3 points for R, 19.0 points 
for H, and 33.2 points for S.! 

The indices tell a less consistent story regarding the comparison of White-Latino 
segregation and White-Asian segregation. G indicates the two are roughly similar 
but with White-Asian segregation being slightly higher. D and R clearly indicate 
that White-Asian segregation is higher. H indicates the two comparisons are similar 
but with White-Latino segregation being slightly higher. In contrast, S indicates that 
White-Latino segregation is considerably higher than White-Asian segregation. At 
both the mean and the median, S for the White-Latino comparison is higher by at 
least 10 points than S for the White-Asian comparison and the mean and median for 
S for the White-Latino comparison is at least double the level of S for the White- 
Asian comparison. 

Close inspection of the underlying distributions of residential outcomes reveals 
general patterns similar to those seen in the example for Houston, Texas discussed 
earlier. Specifically, S is higher for the White-Black comparison because the White- 
Black segregation routinely involves high levels of group separation and neighbor- 
hood polarization and S is lower for the White-Asian comparison because 
White-Asian segregation almost never involves even moderate levels of group sepa- 
ration and neighborhood polarization. White-Latino segregation stands in between; 
it routinely involves moderate levels of group separation and polarization and occa- 
sionally involves high levels. The level of White pairwise contact with Whites 
across CBSAs is very high in both of these White-Minority comparisons; for exam- 
ple, at the median it is 94.4% for White-Black comparisons, 94.5% for White- 
Latino comparisons, and 97.9 % for White-Asian comparisons. Thus, the difference 
in S across the different White-Minority comparisons arises primarily due to differ- 
ences in the levels of pairwise contact Blacks, Latinos, and Asians have with Whites. 
For Blacks the median for pairwise contact with Whites across CBSAs is 46.6%, 
for Latinos it is 68.0%, and for Asians it is 84.4%. The “flip” side of these values — 
that is, average pairwise same-group contact for the minority group — tells a similar 
story. It averages 15.6 % for Asians, 33.0 % for Latinos, and 53.4 % for Blacks. 

Taken together, these results reveal that residential separation from Whites is low 
for Asians, moderate for Latinos, and high for Blacks. Recall that, for separation 
and polarization to be high, both groups in the comparison must reside in neighbor- 
hoods where their group predominates (i.e., when both have high levels of pairwise 


! Comparisons at the means of the distributions yield similar patterns. 
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same-group contact). This is why, in sharp contrast to overall or pairwise isolation, 
separation and polarization are independent of city racial composition. Under even 
distribution, an imbalanced racial mix for the city will cause one group to experi- 
ence a high level of same-group contact but it also will cause the smaller group to 
experience a low level of same group-contact. So, regardless of city ethnic composi- 
tion, segregating forces must be operating for both groups to have high-levels of 
same-group contact. The results just reviewed indicate that Whites consistently 
have high-levels of (pairwise) same-group contact. This is not simply due to city 
racial composition. If it was merely a function of racial composition, Blacks, 
Latinos, and Asians also would experience high levels of contact with Whites when 
same-group contact is high for Whites. But the reality is that same-group contact for 
both groups is above the level expected under even distribution. 

Other indicators (not reported in the table) further confirm that White-Black seg- 
regation routinely involves substantial group residential separation and neighbor- 
hood polarization while White-Asian segregation almost never does and the pattern 
for White-Latino segregation falls in between. One such indicator is whether at least 
half of the population in both groups in the comparison lives in a neighborhood 
where their group constitutes at least 60% of the population. This outcome can 
never occur under even distribution under any city racial composition. So when it is 
observed, it is a clear sign that segregation dynamics have produced group separa- 
tion and neighborhood polarization. This result is seen in 44.5% of White-Black 
comparisons, 11.8 % of White-Latino comparisons, and only 1.5 % of White-Asian 
comparisons. Thus, clear separation and polarization is rare for White-Asian segre- 
gation and uncommon for White-Latino segregation but common for White-Black 
segregation. 


6.1 When Do Indices Agree? When Can They Disagree? 


Table 6.2 presents simple and squared correlations among the scores of the indices 
for White-Minority segregation comparisons for CBSAs in 1990, 2000, and 2010 
previously reported in Table 6.1. Squared correlations are reported above the diago- 
nal and are in bold typeface. Simple linear correlations are reported below the 
diagonal. As noted earlier the full analysis data set includes a total of 4,319 White- 
Minority segregation comparisons where the minority population was 1,500 or 
more. Due to this large sample size all of the correlations reported in the table are 
statistically significant at conventional levels and so statistical significance is not 
specifically noted in the table. As a last preliminary comment, note that the table 
includes correlations for scores for the symmetric version of the Atkinson index 
(Ajosp as an added point of comparison.” 


?This is primarily to document that A is an exact function of H, which is less well known to 
sociologists. 
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Table 6.2 Relationships among indices of uneven distribution for White-Minority segregation 
comparisons in CBSAs in 1990, 2000, and 2010* 


All cases (N = 4,3 19) T 


G A D R H S 
G — Gini Index 1.0000 0.9671 0.9679 0.9355 0.7982 0.3031 
A — Atkinson Index (Ajos;) 0.9834 1.0000 0.9673 0.9793 0.6277 0.2170 
D — Dissimilarity Index 0.9838 0.9835 1.0000 0.9692 0.6838 0.2709 
R — Hutchens Index 0.9672 0.9896 0.9845 1.0000 0.6739 0.2520 
H — Theil Index 0.8934 0.7923 0.8269 0.8209 1.0000 0.8181 
S — Separation Index 0.5505 0.4658 0.5205 0.5020 0.9045 1.0000 


“Squared correlations are reported above the diagonal (in bold, italic). Index scores are computed 
using block-level data from U.S. Census Summary File 1 and PL-94. Cases are for White-Black, 
White-Latino, and White-Asian segregation comparisons excluding CBSAs where the total minor- 
ity population is under 1,500 


The results in the table document several interesting findings. One is that scores 
for indices that are related to the segregation curve — namely, G, A, D, and R — cor- 
relate very closely. The associations among G, A, and D are particularly high. The 
lowest simple linear correlation among them is 0.984 and the lowest squared cor- 
relation is 0.967. Correlations of R with A and D also are very high. The correlation 
of R with G appears to be lower with a squared correlation of 0.936 but closer 
inspection reveals that G and R have a very close relationship that is mildly nonlin- 
ear. This is not surprising as R has an exact nonlinear relationship with A, specifi- 
cally A= (2R —R? ) , which in turn has a close linear relationship with G. 

Figure 6.2 provides graphical depictions of the associations among indices 
reported in Table 6.2. The scatterplots make it clear that relationships among these 
four indices — G, A, D, and R — are exceedingly close, even closer than the high 
correlations suggest if one takes account of the mild nonlinearities in several of the 
relationships. Indeed, in any pair combination, the multiple squared correlations 
for predicting the values of any one index based on the value of one of the other 
indices plus either its square or its square root (depending on the index combina- 
tion) exceeds 0.969 in all cases. These close associations reflect the fact that the G, 
A, D, and R all assess segregation outcomes consistent with the principle of segre- 
gation curve dominance. As noted earlier, this means that all of these indices are 
geared to registering group differences in rank order standing on pairwise contact 
with Whites (p). 

The results reported in Table 6.2 also document a second important finding; the 
correlations involving H and S are lower than the correlations observed among G, 


3 Specifically, G, A, D, and R satisfy the principle of “segregation curve dominance” which means 
that when comparing two cases the index will indicate that segregation is lower for a case if its 
segregation curve is somewhere above and nowhere below the segregation curve for the other case. 
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Fig. 6.2  Scatterplots depicting relationships among indices of uneven distribution for White- 
Minority segregation comparisons in CBSAs in 1990, 2000, and 2010 — full analysis sample 
(Note: scores are for White-Black, White-Latino, and White-Asian segregation comparisons com- 
puted using block-level data from U.S. Census summary files) 


A, D, and R. Unlike G, A, D, and R, H and S are not related to the segregation curve. 
It is perhaps not surprising then that H and S are more strongly associated with each 
other (squared correlation of 0.818) than with the other indices. The H-S scatterplot 
in Fig. 6.2 documents that the correspondence between H and S is close at high 
values but is weaker when one of the indices takes a lower value. This accounts for 
why the correlation between H and S is not as high as those seen among G, D, A, 
and R. Generally, but not always, scores for H run higher than scores for S. This 
tendency is more pronounced when S is in the low-to-moderate range (e.g., below 
40). The squared correlations of H with G, A, D, and R are not as high as the squared 
correlation of H with S; but they are moderately strong and run from a low of 0.628 
to a high of 0.798. The squared correlations of S with G, A, D, and R are much 
lower across the board. They run from a low of 0.217 to a high of only 0.303. 
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Figure 6.3 presents selected scatterplots from Fig. 6.2 to highlight particular 
results. It shows that the correspondence of H with G, D, and R is relatively close at 
high and low values of H, but it is looser in the mid-ranges of H. In the case of the 
relationship of R with H, values of R rarely fall more than a few points below values 
of H; less than ten percent of cases are lower by more than five points and none are 
lower by 10 points. However, in the low-to-middle ranges of H (i.e., 25-50), the 
values of R often are substantially higher than the values of H with R exceeding H 
by more than 10 points in over a quarter of cases. In the case of the relationship of 
D with H, values of D always are well above values of H and again it is evident that 
the D-H discrepancies are largest in the low-to-middle range of H (i.e., 25-50). A 
similar pattern is seen in the relationship of G with H. Values of G always are well 
above values of H and the G-H discrepancies tend to be largest in the lower middle 
range of H (i.e., 20-40). 

S has a close correspondence with G, D, and R only when values of S are high- 
to-very high. When values of S are not high, the relationships between S and these 
three indices are weak and inconsistent. The reason for this is that values of G, D, 
and R can and frequently do vary over wide ranges when S is at low-to-moderate 
values. To be sure, G, D, and R can and sometimes do agree with S and take low-to- 
moderate values when S takes low-to-moderate values. But G, D, and R also can 
and often do take high values when the value of S is low. 

It is instructive to consider the comparison of S with D. Scores of D are never 
lower than scores of S, but the amount by which D exceeds S can and does vary 
dramatically across comparisons. For example, when S is in the range of 15-25, the 
interdecile range for the difference between D and S is 27.5 points with more than 
ten percent of scores for D falling below 47 and more than 10% exceeding 73. 
Similarly, when S is in the range of 35—45, the interdecile range for the D-S differ- 
ence is 22.6 with over ten percent of scores for D below 56 and more than 10% 
above 78. The patterns for S compared with G are similar. Scores for G are never 
below D and thus run considerably higher than scores for S. But the amount by 
which G exceeds S varies greatly. For example, when S is in the range of 15-25, the 
interdecile range for the difference between G and S is 25.3 points with more than 
10 % of scores for G falling below 64 and more than 10 % falling above 88. Similarly, 
when S is in the range of 35—45, the interdecile range for the G-S difference is 19.0 
points with more than 10% of scores of G falling below 73 and more than 10% 
exceeding 91. 

The pattern for S compared with R is similar to those just described for D and G 
but with one difference; scores for R occasionally are lower than scores for S. This 
is not typical and, when it occurs, R is lower than S only by a small amount. The 
more important finding is that the values of R, like values of D and G, can vary 
greatly at a given level of S. For example, when S is in the range of 15-25, the inter- 
decile range for the R-S difference is 31.3 points with over 10% of scores for R 
below 23 and more than 10 % above 53. The same variability in scores for R is seen 
when S is in the range of 35—45. In this situation, the interdecile range for the R-S 
difference is 29.9 points and it is not uncommon to observe scores of R ranging at 
or below 29 to at or above 59. 
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Fig. 6.3 Scatterplots depicting relationships of H and S with G, D, and R for White-Minority 
segregation comparisons in CBSAs in 1990, 2000, and 2010 — full analysis sample (Notes: scores 
are for White-Black, White-Latino, and White-Asian segregation comparisons computed using 
block-level data from U.S. Census summary files) 
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Summing up, indices that are closely associated with the segregation curve — 
namely, G, A, D, and R — correlate at high levels with each other, but less so with H 
and much less so with S, two measures not linked to the segregation curve. These 
findings depart dramatically from previous findings of high correlations among all 
indices of uneven distribution. For example, Duncan and Duncan’s (1955a) land- 
mark methodological study reported that D, G, and S were correlated at high levels 
and suggested the correlations were so high that there was little practical benefit to 
gain from considering measures beyond D which had advantages in ease of calcula- 
tion and interpretation. More recently, the valuable and influential methodological 
study by Massey and Denton (1988) similarly reported very high levels of correla- 
tion among G, Aso), D, H, and S with the lowest correlation among the indices 
being 0.89 (for the correlation between G and S). 

Why are these correlations reported in these previous studies so high when cor- 
relations of G, A, D, and R with H and S reported here are moderate-to-weak? The 
answer traces to basic differences in research design across the studies. Specifically, 
the difference in findings traces to difference in the samples of cities considered and 
to differences in the spatial units used when computing segregation scores. 
Regarding the differences in the samples of cities, the studies by Duncan and 
Duncan (1955a) and Massey and Denton (1988) both were based on 60 cities con- 
sisting primarily of the largest metropolitan areas in the country. Duncan and 
Duncan examined cities for which tract data had been tabulated in the 1940 census 
and the sample was primarily, but not exclusively, comprised of the largest metro- 
politan areas in the country. Massey and Denton developed their analysis sample by 
first taking the 50 largest metropolitan areas and then including an additional 10 
metropolitan areas with large Latino populations. Regarding spatial units, both 
studies used tract-level data when computing segregation scores. While this is a 
common practice, it is not well suited for assessing segregation for smaller groups 
or for assessing segregation in smaller communities. These two aspects of the sam- 
ples used in the landmark studies by Duncan and Duncan (1955a) and Massey and 
Denton (1988) tend to minimize differences between measures that emerge in the 
much broader sample used here. To be clear, the results reported in these earlier 
studies are not incorrect. But the results reported in these studies do not generalize 
beyond large metropolitan areas. 

I provide evidence to support this conclusion with several analyses. To begin I 
replicated the analysis reported in Table 6.2 using a subset sample of 58 CBSAs that 
corresponds as closely as possible to the cities used in Massey and Denton’s (1988) 
study.’ I found that the correlations among indices obtained using this subsample 
were consistently higher, often by substantial amounts and were never significantly 
lower in comparison to correlations using the broader sample. For example, the cor- 
relation of D and S using scores computed from block data was 0.5205 in the 
broader sample and 0.6433 in the Massey and Denton subsample. I then examined 


“Two cases in the Massey and Denton sample are not included in the subset of cases examined 
here. In the 2010 CBSA definitions used here their areas of Paterson-Clifton and Jersey City are 
assigned to the New York-White Plains-Wayne CBSA Division. 
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Table 6.3 Relationships among indices of uneven distribution by size of combined group 
populations for White-Minority segregation comparisons in CBSAs in 1990, 2000, and 2010* 


CBSAs by size of combined group populations (in 1,000 s) 


1,000- 2,000 
All < 100 100-249 250—499 500-999 1,999 or More 
Correlation of Dissimilarity Index (D) and Separation Index (S) 

Tracts 0.7678 0.6821 0.7042 0.8108 0.8739 0.9154 0.9355 

Block groups 0.7862 0.7467 0.7337 0.8043 0.8634 0.9111 0.9306 

Blocks 0.5205 0.6581 0.4099 0.3123 0.4649 0.6979 0.7190 
Squared Correlation of Dissimilarity Index (D) and Separation Index (S) 

Tracts 0.5895 0.4652 0.4959 0.6574 0.7637 0.8380 0.8752 

Block groups 0.6181 0.5576 0.5383 0.6469 0.7470 0.8301 0.8660 

Blocks 0.2709 0.4331 0.1680 0.0975 0.2161 0.4871 0.5170 
N of cases 4,319 1,689 1,183 631 392 277 147 


“Index scores are computed using data from U.S. Census Summary File 1 and PL-94. Cases are for 
White-Black, White-Latino, and White-Asian segregation comparisons excluding CBSAs where 
the total minority population is under 1,500 


correlations using index scores computed from tract-level data instead of block- 
level data. The correlations among indices increased by substantial amounts and 
closely matched the correlations reported in Massey and Denton (1988). For exam- 
ple, the correlation of scores for D and S based on tract-level data in the subsample 
of cases corresponding to the Massey and Denton sample was 0.9248 and replicates 
the value of 0.92 reported in Massey and Denton. 

These analyses establish that the associations among segregation indices are mark- 
edly lower when study designs draw on a broader sample of cities and assess segrega- 
tion using block data instead of tract data. For example, when computing scores using 
tract data the squared correlation between D and S is 0.8552 (r = 0.9248) for the 
Massey and Denton subsample of CBSAs. It drops to 0.5895 (r = 0.7678) when 
using the broader sample. Both values are much higher than the squared correlation 
of 0.2709 (r = 0.5205) observed for the broader sample of CBSAs using scores com- 
puted from block-level data. 

Table 6.3 explores the issue in more detail by reporting the correlation and 
squared correlation of D and S using subsets of segregation comparisons grouped 
by the size of the populations in the segregation comparisons (a close correlate of 
city population size). Correlations are reported separately for index scores based on 
tract, block group, and block data. Several patterns are clear. 


e Correlations are consistently stronger for scores computed using tract data and 
weaker for scores computed using block data. 

e Correlations are stronger for comparisons for CBSAs with populations of 
500,000 and even stronger for CBSAs with populations of 1,000,000 or more. 
This pattern holds generally for scores computed using tract, block group, and 
block data. 
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These results support the general conclusion that correlations between indices 
are consistently weaker when using broader, more heterogeneous samples of cities 
and when using index scores computed for blocks instead of tracts. As a final check, 
I replicated these results using alternative versions of index scores that corrected for 
index bias (discussed in Chaps. 14 and 15), a potential concern when using index 
scores computed from block-level data. The relevant results were fundamentally 
similar and strengthen the conclusion I offer here. 

I now answer the questions posed in the heading for this section of the chapter, 
“When do different indices agree?” and “When can they disagree?” The previous 
discussion provides a preliminary answer. Indices are more likely to agree in studies 
that focus on large metropolitan areas and compute index scores using tract-level 
data. Conversely, indices are more likely to disagree in studies that use broader 
samples and/or compute index scores with block-level data. But why is this so? Two 
findings provide clues. One is that cities in the Massey and Denton sample have 
higher levels of relative minority presence and the other is that correlations among 
indices are consistently higher when the relative size of the minority population is 
larger. Among the CBSAs segregation comparisons that meet the criterion of having 
at least 1,500 in population for the minority group, relative minority presence is 
consistently higher in the subset of CBSAs in the Massey and Denton subsample 
and this is true for all three White-Minority comparisons considered. 

This is consequential because correlations among indices are higher when pair- 
wise minority group proportions are moderate-to-high.* Evidence for this is pre- 
sented in Fig. 6.4 and in Table 6.4. Table 6.4 is organized in three panels. The top 
panel gives correlations among index scores computed from block-level data for the 
subset of White-Minority segregation comparisons where the two groups in the 
comparison are similar in relative size; specifically, these are the subset of 510 seg- 
regation comparisons where the pairwise proportion for the smaller group in the 
comparison is in the range of 0.30-0.50. The key finding documented here is simple 
and compelling; the correlations among all of the indices are extremely high. The 
weakest relationship observed is between G and R with a simple linear correlation 
of 0.9697 and a squared correlation of 0.9403. Figure 6.4 presents the scatterplots 
for these same relationships. It documents that the relationships are even stronger 
than the simple linear correlations suggest as the lower correlations involve rela- 
tionships that are very close but mildly nonlinear. When the nonlinearities are taken 
into account, all relationships are near exact. For example, the G-R combination has 
the lowest squared linear correlation (0.9403) but regressing G on R and the square 
root of R yields a multiple R-square statistic of 0.9859. 

The middle panel of Table 6.4 presents results for White-Minority segregation 
comparisons where the pairwise proportion for the smaller group in the comparison 
is the range of 0.10-0.30. The key finding documented here is that, while the cor- 
relations are generally lower, they all remain very high. Thus, the lowest squared 


>More carefully, correlations are higher when the two groups are similar in size; that is, when P 
and Q are equal. The distinction is relevant in segregation comparisons where Whites are the 
smaller group; for example, White-Latino segregation in San Antonio and El Paso. 
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Fig. 6.4 Scatterplots depicting relationships among indices of uneven distribution for White- 
Minority comparisons in 1990, 2000, and 2010 — subset of CBSAs with Minority Proportion 
= 0.30 (Notes: scores are for White-Black, White-Latino, and White-Asian segregation compari- 
sons computed using block-level data from U.S. Census summary files) 


correlation is 0.8660 for the D-S relationship and nine of the fifteen correlations 
exceed 0.95. 

The bottom panel of Table 6.4 reports correlations among index scores for the 
subset of White-Minority segregation comparisons where the minority group is 
small in relative size. Specifically, it reports correlations for cases where the pair- 
wise proportion for the smaller group in the comparison is under 0.10. Two findings 
warrant mention. First, the correlations among G, D, A, and R — the four measures 
related to the segregation curve — remain high; the lowest squared correlation is 
0.9432 for the G-R combination. Second, and more importantly, the squared 
correlations involving H and S — the two measures not related to the segregation 
curve — drop off considerably, especially correlations involving S. The squared cor- 
relation of 0.8370 between H and S is fairly high. But squared correlations of H with 
G, D, A, and R fall in a substantially lower range of 0.7056—0.7543 for H and the 
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Table 6.4 Relationships among indices of uneven distribution by group proportions for White- 
Minority segregation comparisons in CBSAs in 1990, 2000, and 2010* 


G A D R H S 
CBSAs where the smaller pairwise group proportion > 0.30 (N=510) 
G — Gini Index 1.0000 0.9799 0.9795 0.9403 0.9618 0.9690 


A — Atkinson Index (Ajos}) 0.9899 1.0000 0.9690 0.9799 0.9825 0.9754 
D — Dissimilarity Index 0.9897 0.9844 1.0000 0.9641 0.9843 0.9898 


R — Hutchens Index 0.9697 0.9899 0.9819 1.0000 0.9932 0.9789 

H — Theil Index 0.9807 0.9912 0.9921 0.9966 1.0000 0.9958 

S — Separation Index 0.9844 0.9876 0.9949 0.9894 0.9979 1.0000 
CBSAs where the smaller pairwise group proportion > 0.10 and < 0.30 (N=1,163) 

G - Gini Index 1.0000 0.9750 0.9751 0.9339 0.9339 0.8699 


A — Atkinson Index (Ajos) 0.9874 1.0000 0.9748 0.9805 0.9520 0.8660 
D — Dissimilarity Index 0.9875 0.9873 1.0000 0.9663 0.9580 0.8857 


R — Hutchens Index 0.9664 0.9902 0.9830 1.0000 0.9714 0.8874 

H — Theil Index 0.9664 0.9757 0.9788 0.9856 1.0000 0.9694 

S — Separation Index 0.9327 0.9306 0.9411 0.9420 0.9846 1.0000 
CBSAs where the smaller pairwise group proportion < 0.10 (N=2,646) 

G — Gini Index 1.0000 0.9761 0.9628 0.9432 0.7515 0.3756 


A — Atkinson Index (Ajos}) 0.9880 1.0000 0.9791 0.9801 0.7056 0.3132 
D — Dissimilarity Index 0.9812 0.9895 1.0000 0.9797 0.7271 0.3325 


R — Hutchens Index 0.9712 0.9900 0.9898 1.0000 0.7543 0.3581 
H — Theil Index 0.8669 0.8400 0.8527 0.8685 1.0000 0.8370 
S — Separation Index 0.6129 0.5596 0.5766 0.5984 0.9149 1.0000 


‘Squared correlations are reported above the diagonal (in bold, italic). Index scores are computed 
using block-level data from U.S. Census Summary File 1 and PL-94. Cases are for White-Black, 
White-Latino, and White-Asian segregation comparisons excluding CBSAs where the minority 
population is under 1,500 


squared correlations of S with these measures fall in a much lower range of 
0.3132-0.3756. 
I highlight the most important points of the above discussion as follows. 


e Scores for all popular segregation indices consistently agree and correlate closely 
with one another when the two groups in the comparison are similar in size. 

e Scores for popular segregation indices that are closely related to the segregation 
curve — G, D, A, and R -consistently agree and correlate closely with one another 
regardless of relative group size. 

e Scores for popular segregation indices not related to the segregation curve — H 
and S — correlate closely with each other even when relative group size is imbal- 
anced (i.e., when the pairwise proportion for the smaller group is under 0.10). 

e Scores for H and S correlate closely with scores for G, D, A, and R when relative 
group size is relatively balanced (i.e., when the pairwise proportion for the 
smaller group is 20.10). But the correlations fall off substantially, especially 
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those involving S, when the pairwise proportion for the minority group is low 
(i.e., below 0.10). 


6.2 Why Does Relative Group Size Matter? 


The difference of means framework provides a basis for gaining insight into these 
findings. In this framework segregation index scores are obtained as differences of 
group means on segregation-relevant residential outcomes (y) that are scored from 
area proportion White (p) via index-specific scaling functions y = f (p). It is obvi- 
ous that scores for different indices will correlate more closely when the index- 
specific scaling functions y =f ( p) for the indices involved are similar. Conversely, 
correlations among scores will be lower when the scaling functions involved differ. 
The graphs in Fig. 5.1 introduced earlier documented how the scaling functions 
vary across indices. In the case of S, the scaling function is linear. The scaling func- 
tions for the other indices are nonlinear with nonlinearity being more pronounced 
for some indices than for others. Specifically, the graphs in Fig. 5.1 documented that 
the nonlinearity is least pronounced for H and progressively more pronounced for 
R, D, and G. This helps explain why scores for G, D, and R consistently correlate 
closely. It also helps explain why scores for S correlate more closely with scores for 
H than with scores for G, D, and R. 

The scaling function for S is invariant across variation in relative group size; y 
is always a simple, one-to-one linear function of p. Significantly, the scaling func- 
tions for all of the other indices vary systematically with relative group size. 
Specifically, the “amplitude” of the nonlinearity in the scoring function is most 
pronounced when relative group size is highly imbalanced and it is least pro- 
nounced when relative group size is equal (i.e., 50/50). Figures 6.5 and 6.6 docu- 
ment this for the Theil index (H) and the Hutchens square root index (R) by plotting 
the scaling function y = f (p) with values of relative group size set variously at 
0.01, 0.05, 0.20, 0.50, 0.80, 0.95, and 0.99. The variation in nonlinearity is particu- 
larly easy to summarize for these two functions because they are smooth and con- 
tinuous. Nonlinearities in the scaling functions for G and D behave in a similar 
manner, but are more complicated visually because the functions involve mono- 
tonic but irregular step functions. 

Figures 6.5 and 6.6 show that in all four cases the nonlinear functions departure 
from linearity is mildest when groups in the segregation comparison are similar in 
size and it grows increasingly more pronounced as groups become more unequal in 
size. Since the scaling function for S is always linear, this explains why scores for S 
correlate more closely with scores for the other indices when groups are equal in 
size and less closely, sometimes markedly so, when the two groups in the compari- 
son are unequal in size. In general, the difference between any two index-specific 
scaling functions is least pronounced when groups are equal in size and it grows 
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Fig. 6.5 Scoring y=f(p) for computing Theil’s H as a difference of group means on scaled contact 
(Curves reflect y=f(p) for Theil’s H based on y=Q + [(E-e)/E] / (p/P — q/Q) for selected values of 
proportion White in the city (P). Moving from the top curve to the bottom curve, the selected val- 
ues for P are: 0.01, 0.05, 0.20, 0.50, 0.80, 0.95, and 0.99, respectively. The diagonal line reflects 
y=f(p) for S) 


larger as groups become more unequal in size. This accounts for why index scores 
generally correlate more closely when groups are equal in size and correlate less 
closely when groups are unequal in size. 

The potential discrepancies between scores for different indices follow a very 
clear pattern. At one end of the spectrum there are indices like G and D which reg- 
ister residential outcome scores (y) based on scaling functions that involve more 
pronounced nonlinearities (as seen in Fig. 5.1). On the other end of the spectrum are 
indices like H and S which register residential outcome scores (y) based on scaling 
functions that involve only mild nonlinearity (H) or simple linear scaling (S). Under 
all conditions scores for G and D consistently run higher than scores for H and 
S. But there are big differences in how this plays out depending on the group size 
comparison. When group size is relatively balanced (e.g., pairwise proportion for 
the smaller group is 0.15 or higher), scores for G and D will run higher than scores 
for H and S and will fall in a narrow range of variation at any particular level of H 
or S. In contrast, when group size is imbalanced (e.g., pairwise proportion for the 
smaller group under 0.10), scores for G and D will run higher than scores for H and 
S but they may fall in a sizeable range of variation at any particular level of H or S. 

This is documented in Fig. 6.7 which plots values of D against values of H and S 
for three sets of cases.° The first panel of the figure depicts the D-H and D-S rela- 


€ Results for G and R are not shown, but are similar. I highlight results for D because it is used more 
often in empirical studies. 
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Fig. 6.6 Scoring y=f(p) for computing Hutchens’ R as a difference of group means on scaled 
contact. Curves reflect y=f(p) for Hutchens’ R based on y =Q + (1 —Jpq/ PQ ) / (p /P-q/ Q) for 


selected values of proportion White in the city (P). Moving from the top curve to the bottom curve, 
the selected values of P are: 0.01, 0.05, 0.20, 0.50, 0.80, 0.95, and 0.99, respectively. The diagonal 
line reflects y=f(p) for S 


tionships for all CBSAs. The second panel depicts the same relationships for the 
subset of CBSAs where the pairwise proportion for the smaller group is 0.15 or 
higher. The third panel depicts the relationships for the subset of CBSAs where the 
pairwise proportion for the smaller group is below 0.10. Note that I exclude CBSAs 
with the very lowest values (i.e., values below 0.02) on pairwise group proportion 
so it will be clear that the pattern observed in this panel is not determined by extreme 
cases. The scatterplots in the second panel document that, when the groups in the 
segregation comparison are relatively similar in size, D varies in a narrow range at 
any particular level of H or S. The scatterplots in the third panel document that, 
when the groups in the segregation comparison are somewhat unequal in size, D 
varies in a much larger range at any specific level of H and S. On the low end, the 
variation in D extends down to the levels seen in the second panel of the figure. On 
the high end the variation in D is considerable and often ranges 25-35 points above 
scores on the low end. The first panel combines the CBSAs in the second and third 
panels and also includes CBSAs where the smaller group meets the group size 
requirement of 1,500 in population but has a pairwise proportion of less than 0.02. 
This amplifies the pattern seen in the third panel by extending the range of variation 
on both the high and low ends at any given level of H and S. 

Figure 6.7 documents that popular indices of uneven distribution can and often 
do yield highly discrepant results. When this happens, a specific substantive inter- 
pretation applies. The pattern of segregation in these situations involves extensive 
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group differences in displacement from parity but does not involve high levels of 
group residential separation and neighborhood polarization. The combination 
comes about because indices such as G, D, and R can respond with high scores 
when displacement from parity involves group differences in pairwise contact that 
are quantitatively small. Indices that register group separation and neighborhood 
polarization take low values in these situations because the two groups are living 
together, not apart, with most minority individuals living with Whites and few resid- 
ing in predominantly minority residential areas (e.g., ghettos and barrios). It is 
important to be aware of this possibility for many reasons not the least of which 
being that it affects the potential policy implications of eliminating uneven distribu- 
tion. When group separation and area polarization are absent, majority-minority 
differences in residential outcomes will change little when uneven distribution is 
eliminated. When separation and polarization are present, the residential outcomes 
experienced by minority individuals can potentially change dramatically when 
uneven distribution is eliminated. I believe this is an important aspect of the corre- 
spondence, or lack of it, between different indices. Accordingly, I review the issue 
in more detail in Chaps. 7 and 8. 
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Chapter 7 
Distinctions Between Displacement 
and Separation 


The previous chapter documents that the separation index (S) can reveal the pres- 
ence of important aspects of residential segregation that cannot be reliably estab- 
lished by examining the more widely used dissimilarity index (D). Specifically, S 
reliably indicates whether groups are separated and live apart from each other in 
different areas of the city and experience substantially different residential out- 
comes — at minimum with respect to area racial composition and potentially also on 
other neighborhood outcomes that co-vary with area racial composition. High val- 
ues on S thus signal that groups are residentially separated and reside apart from 
each other in areas that are polarized on racial composition. The same cannot be 
said for D. To the contrary, D can and sometimes does take high values when two 
groups are not residentially separated and in fact live together in the same neighbor- 
hoods and experience quantitatively similar residential outcomes on area racial 
composition. Thus, high values on D cannot and do not reliably signal the presence 
of group residential separation and neighborhood racial polarization. 

I view the issue of whether groups are separated and live apart in different neigh- 
borhoods or live together in the same areas and share neighborhood outcomes as 
fundamental to segregation research. The following two quotes from Massey and 
Denton’s (1988) landmark methodological study are consistent with this view. 
Speaking of segregation in broad terms they state “At a general level, residential 
segregation is the degree to which two or more groups live separately from one 
another, in different parts of the urban environment.’ (1988:282, emphasis added). 
Speaking more specifically of the dimension of uneven distribution they state 
“Evenness is minimized and segregation maximized when no minority and majority 
members share a common area of residence” (1988:284). 

These statements resonate with prevailing substantive intuitions about residential 
segregation. Researchers and broader audiences alike presume that high scores on 
segregation signal that the groups in the comparison live apart from each other in 
different neighborhoods and thus do not share common fate based on area of 
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residence. The separation index (S) provides a reliable signal on this count. The dis- 
similarity index (D) does not. D does not because it measures something different 
from whether groups live together or apart. Specifically, D provides a reliable signal 
regarding whether groups differ in their extent of being displaced from parity. 
Significantly, however, group differences in being displaced from parity and group 
residential separation are not the same things and they do not necessarily correlate 
closely empirically. Displacement and separation often do take high values together. 
But, importantly, group difference in displacement from parity can be high when 
group residential separation is low. 

In this chapter I seek to clarify the differences between separation (S) and dis- 
placement (D) in more detail. I begin by noting that the issue has become more 
important in recent decades because conceptual distinctions between separation and 
displacement have come to take on greater practical significance in empirical analy- 
ses. The main reason for this is that the scope of segregation studies has expanded 
and the racial demography of US urban areas in the United States has become more 
complex. As a result, researchers are now frequently investigating segregation in 
situations where large differences between scores on separation and displacement 
are more common than was the case in an earlier era of segregation research. 

I frame the substantive issues involved by introducing two terms. The first is 
“prototypical segregation” which is associated with a pattern of “concentrated dis- 
placement’. The second is the opposite condition of “dispersed displacement” a 
pattern of segregation that is empirically common but largely unrecognized in the 
measurement literature. 

In the pattern of “prototypical segregation” displacement from even distribution 
concentrates the populations of the two groups into homogeneous areas that differ 
by quantitatively large amounts on area racial composition. When such a pattern of 
“concentrated displacement” is present, group residential separation and area racial 
polarization as indicated by S will approach the maximum levels possible at a given 
level of displacement from parity as indicated by D. In the logical extreme where 
displacement is concentrated to the maximum possible extent, the value of S will 
equal the value of D. The pattern of “dispersed displacement” is at the opposite end 
of the spectrum. Under this pattern levels of group residential separation and area 
racial polarization are far below the maximum levels possible for a given level of 
displacement. In sum, under “prototypical segregation” involving concentrated dis- 
placement values of D and S correspond closely. Under dispersed displacement, 
values of D and S diverge by large amounts. 

I next explore these issues in two extended technical discussions that clarify the 
basis for D-S congruence and divergence. In the first discussion I contrast how D 
and S respond differently to residential exchanges that promote integration and/or 
segregation and I describe how this can lead to D and S taking either similar or dis- 
crepant values. In the second discussion I introduce simple analytic models that 
reveal more precisely how displacement (D) and separation (S) can vary indepen- 
dently to produce residential patterns ranging from “prototypical segregation” to 
“dispersed displacement” at any given combination of displacement (D) and overall 
city racial composition (P). I then “exercise” the models to produce graphical results 
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that reveal the nature and range of potential combinations of displacement (D) and 
group separation (S) by level of city racial composition. 

I close the chapter by considering the question of whether displacement (D) and 
separation (S) should be seen as distinctly different dimensions of segregation. My 
discussion gives attention to three alternative views. One is the position suggested 
by Stearns and Logan (1986) which holds that group separation and area racial 
polarization should be seen as a distinctive dimension of segregation to be consid- 
ered along with uneven distribution and exposure. Another view takes the position 
that group separation and area racial polarization can be seen as an important aspect 
of uneven distribution that may or may not be present when group distributions are 
displaced from even distribution. I also consider and dismiss a mistaken third view, 
sometimes suggested in the literature, that group separation and area polarization 
reflects exposure. 

In the end I endorse a practical compromise. In my view it ultimately is not cru- 
cial whether one classifies group separation and area racial polarization as a distinct 
dimension of segregation or is a particular aspect of uneven distribution. What is 
crucial is for researchers to recognize that separation, displacement, and exposure 
all provide useful information and all three can and do vary independently in empir- 
ical analyses. This knowledge will help researchers choose measures that best serve 
their research interests. My view is that this will lead researchers to pay closer atten- 
tion to group separation and area polarization as measured by S because S provides 
a reliable signal about the presence or absence of “prototypical segregation” which 
researchers and broad audiences alike find more interesting and compelling than 
“dispersed displacement”. 


7.1 The Increasing Practical Importance of the Distinction 
Between Displacement and Separation 


Stearns and Logan (1986) argued that the distinction between D and S is important 
noting that the measures “are responsive to different aspects of changes in racial 
residential patterns” and can “lead to divergent, sometimes contradictory, results” 
(1986:125-126). To support their view they noted the example of Logan and 
Schneider (1984) who found that D and S gave different results regarding trends in 
White-Black segregation in suburban areas with S showing increasing segregation 
while D indicated declining segregation. Studies by Schnare (1980) and Smith 
(1991) also reported finding different patterns and trends in residential segregation 
when using D and S. 

Coleman et al. (1982:177-179) had previously argued that D and S differ in abil- 
ity to provide a reliable signal of when group have important differences in residen- 
tial outcomes and noted that D can take high values when the two groups in the 
comparison have fundamentally similar distributions on residential outcomes. 
Zoloth (1976) made similar points in an earlier methodological study. Unfortunately, 
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the findings and observations reported in these studies have had minimal impact on 
prevailing practices in segregation research. Empirical studies overwhelmingly use 
D over alternative measures and typically do not report whether findings are similar 
or different depending on whether alternative indices such as S are used. This sug- 
gests that researchers generally are not aware of two points. The first is that D and 
S can take highly discrepant scores and can move in different directions. The second 
is that whether scores for D and S align or diverge it has important implications 
about fundamental aspects of the nature of segregation. 

Prevailing practices may have been more understandable and less consequential 
in an earlier era of segregation research. For many decades empirical studies focused 
primarily on White-Black segregation in large metropolitan areas where Black pop- 
ulations were substantial in size and typically were concentrated in large ghettos. 
The empirical analyses in the Chap. 6 showed that discrepancies between displace- 
ment (D) and separation (S) tend to be less dramatic when analysis is restricted to 
this particular subset of segregation comparisons. So, while D and S are not exactly 
interchangeable in these situations, displacement typically is highly concentrated. 
As a result the values of D and S tend to correlate closely and index choice may be 
less likely to lead to important practical differences in findings. 

Times have changed. The racial and ethnic composition of cities in the United 
States has undergone dramatic demographic transformation. Additionally, the scope 
of segregation studies has expanded to consider segregation across a wider range of 
group comparisons and a wider range of community settings. In these new circum- 
stances of segregation research, researchers cannot safely assume that index choice 
does not matter. To the contrary, nowadays the logical differences between dis- 
placement (D) and separation (S) routinely take on greater practical importance. 
Over the last four decades the Latino and Asian populations have grown rapidly and 
diffused from traditional settlement areas to wider distribution nationally. 
Consequently, segregation studies now examine a broader range of group compari- 
sons beyond the earlier narrow focus on White-Black segregation and routinely give 
attention to White-Latino and White-Asian segregation. Additionally, the focus of 
research has expanded from beyond considering just large metropolitan areas where 
minority presence often is sizeable. Empirical studies now increasingly consider a 
broader range of communities including communities where minority population 
presence is relatively small. This is reflected, for example, in studies that examine 
White-Latino and White-Asian segregation in “new destination” communities 
where Latino and Asian populations are newly arrived and growing rapidly. 
Additionally, segregation studies nowadays investigate segregation over an increas- 
ingly wide range of settings including not only the largest metropolitan areas but 
also smaller metropolitan areas, micropolitan areas, noncore counties, and small 
towns. 

All of these trends make the topic of this chapter more relevant to current and 
future segregation studies. The empirical analyses of White-Minority segregation 
across CBSAs reviewed Chap. 6 document that the correlation of D and S is weaker 
when examining White-Latino and especially White-Asian segregation, weaker 
when examining segregation in smaller communities, and weaker in communities 
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where minorities are smaller in relative size. As a result, we should expect discrep- 
ancies between scores for D and S to be increasingly common and larger in size and 
for the discrepancies to carry increasing substantive importance. Accordingly, it is 
useful to understand the substantive issues that are relevant when D and S align and 
when D and S diverge. To serve this goal I now explore the notion of “prototypical 
segregation” and the contrast between “concentrated” and “dispersed” 
displacement. 


7.2 Prototypical Segregation and Concentrated 
Versus Dispersed Displacement 


I use the term “displacement” to refer to group differences in distribution across 
neighborhoods that are above or below “parity.” Taking Whites as the reference 
group in an analysis of White-Black segregation, displacement is high when a 
large share or proportion of White population resides in “above-parity” areas (i.e., 
where p; < P) and a similarly large share or proportion of the Black population 
resides in “below-parity” areas. Alternatively, displacement is high when Whites 
and Blacks differ on the proportion residing in “above-parity” areas or on the 
proportion residing in “below-parity” areas. These are all slightly different ways 
of describing the same arrangement and all result in the same values on displace- 
ment as measured by D. 

Significantly, the notion of displacement from parity does not specify anything 
further about group residential distributions beyond the narrow confines of what 
was just stated. Displacement varies in extensiveness — the degree to which it 
involves large differences in group portions. But extensiveness of displacement 
does not carry specific implications for the quantitative magnitude of the differences 
in area racial composition between above-parity neighborhoods and below-parity 
neighborhoods. To the contrary, the magnitude of the differences involved can vary 
dramatically at a given level of displacement. The notion of displacement is cap- 
tured well by the dissimilarity index (D) as its value directly registers majority- 
minority differences in proportions residing in “parity” or “above-parity” areas.! 
This quality of D was recognized by Duncan and Duncan (1955) who referred to D 
as the “displacement index.” 

I use the terms “group separation” and “neighborhood polarization” to refer to 
residential distributions that are characterized by groups living apart from each 
other such that members of both of the groups in the comparison are disproportion- 
ately located in areas where their own group predominates. Significantly, displace- 
ment does not necessarily involve group separation. Thus, D is not a valid proxy for 
group separation. Whether or not displacement involves separation depends on 
additional consideration; namely, whether displacement is “concentrated” or 


‘Alternatively, the value of D can be obtained from the Black-White difference in group propor- 
tions residing in “below- parity” areas. 
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“dispersed.” Under concentrated displacement both groups reside apart from each 
other in racially homogeneous areas that differ markedly on racial composition. 
Under dispersed displacement, the groups reside together in areas that differ mod- 
estly on racial composition. 

To clarify, at a given level of displacement, separation is maximized when dis- 
placement is concentrated in a way that maximizes same-group contact for both 
groups.” Conversely, group separation is minimized when displacement is dispersed 
in a way that produces a low level of same-group contact for at least one of the two 
groups. The notion of separation just outlined is captured well by S which registers 
the majority-minority difference in (pairwise) contact with the majority group. 


7.2.1 Prototypical Segregation 


I use the term “prototypical segregation” to refer to a residential pattern where 
group separation approaches the maximum that can occur at a given level of dis- 
placement. I characterize this pattern as prototypical because, without exception so 
far as I have been able to find, this is the pattern of segregation depicted when 
examples of high levels of segregation are introduced and reviewed in didactic dis- 
cussions of residential segregation. For example, it is the kind of segregation pattern 
seen in didactic illustrations and discussions provided by Taeuber (1964), Taeuber 
and Taeuber (1965), Jaret (1995), and Iceland et al. (2000). It also is the kind of 
segregation pattern seen in familiar examples of high levels of segregation as 
observed for White-Black segregation in cities such as Chicago, Detroit, Cleveland, 
and Milwaukee and as observed for White-Latino segregation in Los Angeles. What 
is common in these situations of prototypical White-Minority segregation is this: 
White households are living in above-parity neighborhoods that are predominantly 
White in racial composition and, similarly, minority households are living apart 
from Whites in below-parity neighborhoods that are predominantly minority in 
racial composition. Accordingly, non-parity areas are “polarized” into areas that 
differ greatly on racial composition with Whites being concentrated in predomi- 
nantly White areas and minorities being concentrated in predominantly minority 
areas typically forming enclaves, barrios, and ghettos. 

Values of D and S correspond closely when the condition of prototypical segre- 
gation hold because displacement from parity is concentrated rather than dispersed. 
When prototypical segregation is pronounced, values of both D and S are high; 
displacement from parity is extensive for both groups and the populations residing 
in non-parity areas are concentrated into areas that are ethnically homogeneous. 
Because the two groups live apart in neighborhoods that are fundamentally different 
in terms of racial composition, residential redistribution that substantially reduces 


7 place emphasis on “for both groups” because this distinguishes separation from simple same- 
group contact and isolation. Isolation is intrinsically affected by city racial composition and sepa- 
ration is not. 
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or eliminates displacement from even distribution also will bring about correspond- 
ingly large quantitative changes in neighborhood racial composition. This will in 
turn carry the potential to also bring about large changes in group differences on 
neighborhood outcomes that are correlated with area racial composition (e.g., social 
problems, amenities, services, etc.). 

My strong sense is that broad audiences, most academics, and even many segre- 
gation researchers generally assume that the residential patterns associated with 
“prototypical” segregation will be present when scores on widely used segregation 
indices such as the dissimilarity index (D) are high. This assumption is mistaken. In 
fairness, however, it is easy to understand why this mistaken view is so widely held. 
Standard examples and didactic discussions encourage the assumption and little in 
the standard methodological literature cautions otherwise. That is, 


Methodological discussions that present examples illustrating how residential segregation 
is captured by the segregation curve and the dissimilarity index (D) rarely, if ever — feature 
residential distributions with low group separation (S) resulting from dispersed displace- 
ment. Instead, they feature residential distributions with high levels of group separation 
resulting from concentrated displacement. 


As a result, the prevailing understanding of segregation measurement rests on a 
widely shared but incorrect assumption that high scores on popular segregation 
indices always signal the condition of prototypical segregation involving concen- 
trated displacement and group residential separation. This is not the case. In particu- 
lar, high values of the dissimilarity index (D), the most widely used segregation 
index, do not and intrinsically cannot provide a reliable signal about the presence of 
prototypical segregation.’ In contrast, high values of the separation index (S) pro- 
vide a certain indication that a high level of prototypical segregation is present. 

The outcome of high displacement but with low separation — that is, high D and 
low S — occurs when residential distributions are characterized by “dispersed dis- 
placement.” In the pattern of dispersed displacement, individuals residing in non- 
parity areas are not concentrated in areas where their group predominates. Instead, 
the residential distribution for at least one of the groups — usually the smaller of the 
two groups, which in White-Minority comparisons in US cities is typically, but not 
always, the non-White minority group — is dispersed widely and thinly across non- 
parity areas such that most members of the group live in “mixed” areas where their 
group is not the predominant presence. Indeed, it can be the case that few members 
of the group live in areas where their group is a majority presence and instead most 
members of the group live in areas where the other group in the comparison is the 
predominant group. As a result, under dispersed displacement the two groups in the 
comparison live together in areas with similar racial composition, not apart from 
each other in areas where racial composition is polarized. 


`The same can be said for any index that ranks segregation comparisons consistent with the prin- 
ciple of segregation curve dominance. In addition to the dissimilarity index (D), this includes the 
gini index (G), the symmetric version of the Atkinson index (Aj), and the Hutchens square root 
index (R). 
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The contrasting notions of prototypical segregation and dispersed displacement 
can be clarified by comparing two logically possible but fundamentally different 
outcomes that can occur at a given level of displacement. One outcome is that all 
group members not living in parity areas reside in perfectly segregated, homoge- 
neous areas. For example, in the case of White-Black segregation, Whites and 
Blacks not living in parity areas would reside in all-White and all-Black areas, 
respectively. I term this “maximally concentrated displacement.” The other out- 
come is that all group members not residing in parity areas reside in areas that come 
as close to matching parity as is demographically feasible. I term this “maximally 
dispersed displacement.” 

Importantly, the values of D and S vary dramatically across these two logical 
possibilities. The value of D will necessarily be the same in both cases. In contrast, 
the value of S will vary across these two cases, potentially by a very large amount. 
For the level of displacement in question, the value of S will take its highest possible 
value, in which case it will equal the value of D, under maximally concentrated 
displacement. S will take its lowest possible value under maximally dispersed dis- 
placement. This leads to a broad rule of thumb for characterizing segregation pat- 
terns. At a given level of displacement, “prototypical segregation” holds when the 
value of S is relatively close to its highest possible value and “dispersed displace- 
ment” holds when the value of S is relatively close to its lowest possible value. 

Under “prototypical segregation,’ D-S combinations are characterized by close 
agreement; their scores roughly correspond at low-low, medium-medium, high- 
high, and so forth. Under “dispersed displacement,” D-S combinations are charac- 
terized by disagreement, sometimes very dramatic disagreement, with scores for D 
being much higher than scores for S. Figure 7.1 places combinations of D and S in 
four general categories based on a two-by-two classification of high and low out- 
comes on the dissimilarity index (D) and the separation index (S). The purpose of 
this simplified presentation is to focus attention of the fundamental differences 
between the logically possible combinations. 

To begin I note that the D-S combination in the upper-left cell of the figure can- 
not occur. As I show below, displacement as measured by D sets the upper limit for 
group separation as measured by S. Accordingly, high values of group separation 
(S) always are accompanied by values of displacement (D) of equal or greater size. 
Consequently, a low-D, high-S combination is not logically possible. The lower-left 
cell (A) is labeled “Low Prototypical Segregation.” It involves a low-level of group 
displacement from even distribution (D) and a corresponding low level of group 
separation (S). The upper-right cell (C) is labeled “High Prototypical Segregation.” 
It involves a high level of group displacement from even distribution (D) and a cor- 
responding high level of group separation (S). The lower-right cell (B) is labeled 
“Displacement without Separation.” It involves a high level of displacement from 
even distribution (D) but with levels of group separation substantially below what is 
possible (S). Since this pattern involves dispersed rather than concentrated displace- 
ment, the alternative label of “Dispersed Displacement” also is appropriate. 

Recall from discussion in earlier chapters that the dissimilarity index (D) can be 
characterized as summary index of group inequality in rank-order position on area 
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Value 


of S 


High 


Low 


Value of D 


Low 


High 


This outcome cannot occur 


(C) High Prototypical Segregation 
(Concentrated Displacement) 


Displacement from even distribution 
is extensive and it is concentrated so it 
involves maximal group separation 
and area racial polarization. 


The group difference in percentage 
attaining parity on area group 
proportion (p) is large andthe group 
difference of means on (p) is large. 


Example generating process - 
implement as many “segregation- 
promoting” exchanges as possible 
without changing D- 


(A) Low Prototypical Segregation 


Displacement from even 
distribution is low and group 
separation and residential 
polarization also are low. 


The group difference in 
percentage attaining parity on 
area group proportion (p = P) is 
small andthe group difference of 
means on (p) is small. 


Example generating process — 
quota allocation or random 
distribution. 


(B) Displacement without Separation 
(Dispersed Displacement) 


Displacement from even distribution 
is extensive, but it is dispersed and 
involves minimal group separation 
and area racial polarization. 


The group difference in percentage 
attaining parity on area group 
proportion (p) is large, but the group 
difference of means on (p) is small. 


Example generating process - 
implement as many integration- 
promoting exchanges as possible 
changing D. 


Fig. 7.1 Possible combinations of high and low values on displacement (D) and separation (S) 


racial composition. Specifically, in the case of White-Minority segregation, D, like 
the gini index (G), reflects rank-order inequality on area proportion White (p).* 
Similarly, the separation index (S) can be characterized as a summary index of 
group inequality on the original or “raw” quantitative scores on area racial 
composition (p). With this in mind, the four cells in Fig. 7.1 can be described in the 


‘Thus, the value of G can be given as twice the value of the group difference in mean percentile 
scores on area group proportion (p). The value of D can be given in the same way based on collaps- 
ing scoring of area group proportion into two categories of “above parity” (p > P ) or not. 
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following terms. The lower-left cell (A) “Low Prototypical Segregation” and the 
upper-right cell (C) “High Prototypical Segregation” both involve situations where 
group distributions on area proportion White (p) produce similar high levels of 
inequality in rank-order position (D) and quantitative difference (S). The lower- 
right cell (B) “Displacement without Separation” involves a high level of group 
inequality on rank order position on area proportion White (p) but a low level of 
group inequality on quantitative differences on area proportion White (p). The com- 
bination indicates that Whites are consistently ranked above Blacks on area propor- 
tion White — as indicated by the high value of D, but the quantitative differences 
involved are small and thus result in the low value of S. Thus, the rank-order differ- 
ences on area proportion White do not translate into group separation because the 
two groups have similar distributions on area racial composition (p) and thus the 
two populations are living together, not apart from each other. 


7.3 Clarifying the Logical Potential for D-S Concordance 
and Discordance — Analysis of Exchanges 


Scores for D and S can diverge because they assess group differences in residential 
distribution in fundamentally different ways. D measures group differences on area 
proportion White (p) in a crude way; it assesses the group difference in relative 
distribution between two kinds of areas; those that are “above-parity” on area pro- 
portion White (p) and those that are “below-parity.”> In contrast, S measures group 
difference in area proportion White (p) based on quantitative differences over the 
full distribution of values for area proportion White (p). Thus, where S registers all 
quantitative information about group differences on area proportion White (p), D 
instead collapses this information into a dichotomous rank-order scoring of “above 
P” or not. Thus, at any value of D, the value of S can vary by a considerable amount 
because, unlike D, S registers group differences in distribution on area proportion 
White (p) both within and across “non-parity” areas. 

Methodological studies establish that the potential for scores of D and S to 
diverge traces to two technical differences between D and S. The first is a well- 
known technical deficiency with D. It is that D does not register all integration- 
promoting exchanges of White and Black households between two areas (Reardon 
and Firebaugh 2002).° The value of D changes only for a partial subset of integration- 
promoting exchanges — those that cause at least one of the two areas involved in the 
exchange to move from being above the value of proportion White for the city (P) 
to at or below P when the exchange is completed, or, alternatively, to move from 
being below P to at or above P. When integration-promoting exchanges involve 


‘The same quantitative result is obtained if the distinction is “at-or-above-parity” and 
“below-parity.” 

°The nature of integration-promoting and segregation-promoting exchange is discussed in more 
detail in a separate section below. 
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households from areas on the same side of the cut point (P) before and after the 
exchange, the value of D does not change. In contrast, S behaves as accepted prin- 
ciples of segregation measurement require; the value of S goes down when any 
integration-promoting exchange occurs and the value of S goes up when any 
segregation-promoting exchange occurs (Reardon and Firebaugh 2002). 

This provides the initial basis for understanding how the value of S can move 
independently of the value of D. It is that, at any value of D, integration-promoting 
exchanges that involve areas on the same side of overall proportion White (P) before 
and after the exchange will cause the value of S to go down while the value of D 
remains fixed. Similarly, segregation-promoting exchanges that involve areas where 
proportion White (p) is on the same side of overall proportion White (P) before and 
after the exchange will cause the value of S to go up while the value of D remains 
fixed. Under accepted principles of segregation measurement the changes in values 
of S that take place while D is remaining constant are desirable; they occur because 
S is registering changes in uneven distribution within non-parity areas. In contrast, 
the non-responsiveness of D is undesirable; it occurs because D is insensitive to 
changes in uneven distribution that are taking place within non-parity areas. 

There is a second basis for why the value of S can move independently of the 
value of D. It is that, even in cases where D does register the impact of an integration- 
promoting exchange, D has a “flat” or “uniform” response regardless of the impact 
of the exchange on group separation as it relates to the magnitude of the changes in 
area racial composition. In contrast, S responds differentially depending on the 
impact the exchange has on group separation by responding more strongly when the 
two areas involved in the exchange are more “polarized” based on being further 
apart on area proportion White (p). That is to say, all else equal, for any exchange 
producing a change in D, the impact on the value of D will be the same regardless 
of the magnitude of the difference on area proportion White (p) between the two 
areas in the exchange but the impact on the value of S will be larger when the dif- 
ference is larger rather than smaller. This conforms to the substantively appealing 
property that exchanging White and Black households across all-White and all- 
Black areas reduces segregation more than exchanging White and Black households 
across areas that are nearly identical on area proportion White (p). The former 
exchange reduces group separation to a greater degree than the latter exchange 
because it has a larger impact on reducing area racial polarization and White-Black 
differences in distribution on area proportion White (p). 

I review the formal basis for this conclusion in the next two sections. I motivate 
the discussion by trying to briefly give an intuitive sense of why the issue is impor- 
tant. At a given level of displacement from even distribution as measured by D 
households not residing in parity areas can be maximally segregated or minimally 
segregated under the exchange criterion. Under maximal segregation, all possible 
segregation-promoting exchanges that do not change the value D are implemented. 
The value of S will equal the value of D and White and Black households residing 
in non-parity areas will be separated into maximally polarized, homogeneous areas. 
Under minimal segregation, all possible integration-promoting exchanges that do 
not change the value of D are implemented. The value of S will be very low in com- 
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parison to the value of D because White and Black households residing in non- 
parity areas will live together in areas that are relatively similar on racial composition. 
The difference between the two extremes is unquestionably sociologically mean- 
ingful. So it is important to understand how D and S differ in their ability to reveal 
these two fundamentally different residential patterns. 


7.3.1 Overview of D-S Differences in Responding 
to Integration-Promoting Exchanges 


In this section I review how D and S respond to exchanges of White and Black 
households across areas. To begin, I note that uneven distribution emerges when two 
areas with the same racial composition — in the White-Black comparison, the same 
area proportion White (p) — exchange a White and Black household. The area 
receiving the White household and losing the Black household now has a higher 
proportion White and the area losing the White household and receiving the Black 
household now has a lower proportion White. Reversing the exchange restores even 
distribution. Accordingly, an “integration-promoting exchange” is one in which the 
White household in the exchange moves from an area where proportion White (p;) 
is higher to an area where proportion White (p;) is lower (i.e., p; > p; ) and the Black 
household in the exchange moves from an area where proportion White (p;) is lower 
to an area where proportion White (p;) is higher (Reardon and Firebaugh 2002:38). 
Conversely, a “segregation-promoting exchange” is one in which the White house- 
hold in the exchange moves from an area where proportion White (p;) is lower to an 
area where proportion White (p;) is higher G.e., p; < p; ) and the Black household in 
the exchange moves from an area where proportion White (p;) is lower. 

In the theory of segregation measurement, the “exchange” criterion holds that 
indices should register all integration-promoting and segregation-promoting 
exchanges by decreasing or increasing in value, respectively, when the exchange is 
completed (Reardon and Firebaugh 2002). The separation index (S) meets this cri- 
terion. The dissimilarity index (D) does not. 

I note that it is reasonable to term segregation-promoting exchanges as “polar- 
izing” and “concentrating” and it is similarly appropriate to term integration- 
promoting exchanges as “depolarizing,” “deconcentrating”, and “dispersing.” A 
segregation-promoting exchange is “polarizing” because it moves the two areas 
involved in the exchange further apart on area proportion White since |p;—p, |! is 
larger after the exchange is completed. At the same time, the exchange is “concen- 
trating” because pairwise same-group contact goes up for both Whites and Blacks 
in the affected areas. Since the residential distribution of Whites and Blacks in other 
areas is unchanged, the result of the exchange is greater overall area polarization, 
greater overall group concentration, greater overall group separation, and a higher 
value of S. 
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An integration-promoting exchange is “depolarizing” because it moves the two 
areas involved in the exchange closer together on area proportion White since 
|p; — p; | is smaller after the exchange is completed. At the same time, the exchange 
is “deconcentrating” because pairwise same-group contact goes down for both 
Whites and Blacks in the affected areas. Again, since the residential distribution of 
Whites and Blacks in other areas is unchanged, the exchange reduces overall area 
polarization, reduces overall group concentration, reduces overall group separation, 
and lowers the value of S. 

Based on this, it is clear that the underlying logic of the separation index (S) reso- 
nates well with the exchange criterion. In contrast, the underlying logic of the dis- 
similarity index (D) is often at odds with the criterion. D registers 
integration-promoting exchanges only in the circumstance that the racial composi- 
tion of the two areas involved in the exchange are on opposite sides of P, proportion 
White for the city overall. Integrating-promoting exchanges that involve areas with 
racial compositions on the same side of P have no impact on D. 

In addition to meeting the minimum requirements for satisfying the exchange 
criterion, the separation index (S) has additional properties that in my opinion are 
desirable for assessing whether groups live apart or together. I list them as 
follows.’ 


e All else equal, an integration-promoting exchange produces a larger reduction in 
S when the two areas involved in the exchange are more polarized. 

I term this the “polarization” property with polarization or dispersion being 
based on the initial size of |p, —p, l. Substantively, this is appealing because, 
assuming area size is constant, exchanges between more polarized areas reduces 
same group contact for larger fractions of the affected population. 

No surprisingly, D does not have this property. 


e All else equal, an integration-promoting exchange produces a larger reduction in 
S when the two areas involved in the exchange are closer to one of the polariza- 
tion boundaries of all-White or all-Black. That is, the reduction is larger when 
the minimum of the two values |p; —11 and |p; —01 is closer to 0.0. 

The substantive appeal of this characteristic is similar to that for the “polariza- 
tion” property. Here again exchanges that involve areas that are nearer to the 
homogeneous “poles” of 0 and | reduce same group-contact for larger fractions 
of the affected population. 

D does not have this property. 


e The “polarization” property holds throughout the full range of area proportion 
White (p). Thus, in contrast to D, integration-promoting exchanges have desir- 
able impacts on reducing S regardless of whether the two areas involved in the 


7I establish these properties by drawing on previous methodological discussions (e.g., Zoloth 
1976; James and Taeuber 1985; Reardon and Firebaugh 2002) and by simulation analyses that 
systematically exercise the possible “event-space” of exchanges between areas in a model city. 
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exchange have racial composition on opposite sides of P — the racial composition 
of the city overall — or on the same side of P. 

This is substantively attractive because it is nonsensical to limit the principle of 
exchanges to apply to exchanges on opposite sides of P (1.e., where p; > P > p,). 
It is possible to achieve integration by making only exchanges of this nature. But 
substantial integration also can be achieved with exchanges on the same side of P 
G.e., where p; >p; >P or P>p, >p; 

There is no substantive basis for ignoring the impact of integration-promoting 
exchanges involving areas with racial compositions on the same side of P. 


7.3.2 Examples of D-S Differences in Responding 
to Integration-Promoting Exchanges 


To illustrate selected points from the preceding discussion, I compare four 
integration-promoting exchanges for a hypothetical city that is populated by only 
White and Black households and has an overall proportion White of 0.50. For sim- 
plicity, I assume all areas are the same size and are populated with 100 households. 
Under these assumptions, relative impact of an exchange on S is strictly determined 
by the impact the exchange has on the White-Black difference in segregation- 
relevant average contact with Whites (p) for the 200 households residing in the two 
areas involved in the exchange.’ For the purposes of this discussion I will designate 
this difference with the Greek letter lambda (A) and express it in percentage form 
(instead of as proportions) for ease of presentation and discussion. 

Figure 7.2 presents results for two pairs of hypothetical exchanges. The first 
panel summarizes results for a pair of integration-promoting exchanges that involve 
areas on opposite sides of P, one above parity and the other below parity. The second 
panel summarizes results for a pair of integration-promoting exchanges that involve 
two areas that are not above parity. I begin by discussing the pair of exchanges in the 
first panel. The first exchange shown involves two areas that are highly polarized on 
racial composition. The first area (Area 1) is an all-White area with 100 White and 
0 Black households. The second area (Area 2) is all-Black area with 0 White and 
100 Black households. The integration-promoting exchange moves a White house- 
hold from Area | (higher p) to Area 2 (lower p) and a Black household from Area 2 
(lower p) to Area 1 (higher p). Following the exchange, Area 1 has 99 White 
households and 1 Black household and Area 2 has 1 White household and 99 Black 
households. 

The integration-promoting exchange could be imagined as two “pioneering” 
residential moves. For example, the exchange could involve the moves of a “pio- 
neering” Black household and a “gentrifying’” White household. The pioneering 
Black household leaves a predominantly Black neighborhood and moves to a pre- 


’The racial composition of all other areas remains unchanged. So the any change in S derives 
solely from the impact of changes in the areas involved in the exchange. 
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Two Examples of Exchanges Involving Areas on Opposite Sides of P 


First Exchange Second Exchange 
Area Population Distributions White N Black N White N Black N 
Area 1 - Before Exchange 100 0 51 49 
Area 2 - Before Exchange 0 100 49 51 
Area 1 - After Exchange 99 1 50 50 
Area 2 - After Exchange 1 99 50 50 
Impact on Index S D 5 D 
Initial White Mean (y-100) 100.00 100.00 50.02 51.00 
Initial Black Mean (y-100) 0.00 0.00 49.98 49.00 
A Before Exchange (x100) 100.00 100.00 0.04 2.00 
Final White Mean (y-100) 98.02 99.00 50.00 0.00 
Final Black Mean (y-100) 1.98 1.00 50.00 0.00 
A After Exchange (x100) 96.04 98.00 0.00 0.00 
A Change (x100) —3.96 —2.00 —0.04 —2.00 


Two Examples of Exchanges Involving Below-Parity Areas 


Third Exchange Fourth Exchange 
Area Population Distributions White N Black N White N Black N 
Area 1 - Before Exchange 49 51 26 74 
Area 2 - Before Exchange 1 99 24 76 
Area 1 - After Exchange 48 52 25 75 
Area 2 — After Exchange 2 98 25 75 
Impact on Index S D S D 
Initial White Mean (y-100) 48.04 0.00 25.04 0.00 
Initial Black Mean (y-100) 17.32 0.00 24.99 0.00 
A Before Exchange 30.72 0.00 0.05 0.00 
Final White Mean (y-100) 46.16 0.00 25.00 0.00 
Final Black Mean (y-100) 17.95 0.00 25.00 0.00 
à After Exchange (x100) 28.21 0.00 0.00 0.00 
A Change (x100) —2.51 0.00 —0.05 0.00 


Fig. 7.2 Impacts of selected integration-promoting exchanges on the value of the separation index 
(S) and the dissimilarity index (D) 
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dominantly White neighborhood. The “gentrifying” White household leaves a pre- 
dominantly White neighborhood and moves to a predominantly Black 
neighborhood. 

In the difference of means framework, the impact of the exchange on an index 
score can be assessed by considering how segregation-relevant residential outcomes 
(y) change for the 200 households in the affected neighborhoods. For S, y is simply 
area proportion White (y =P) so average contact with Whites is initially 100.0 
points for Whites and 0.0 points for Blacks. This yields a value of à — the White- 
Black average difference for the population in the two areas — of 100.0 points. After 
the exchange, average contact with Whites falls to 98.02 points for Whites in the 
two areas and rises to 1.98 points for Blacks, producing a value of à of 96.04 points. 
Thus, the exchange causes the average White-Black contact difference for the sub- 
set of affected households — quantified as i — to fall by 3.96 points. 

The second integration-promoting exchange involves White and Black house- 
holds residing in two areas that differ only slightly on racial composition. Before the 
exchange the first area (Area 1) has 51 White and 49 Black households and the 
second area (Area 2) has 49 White and 51 Black households. After the exchange 
Area | and Area 2 both change to 50 White and 50 Black households thus bringing 
about integration. For the subset of households in the affected households, average 
contact with Whites is initially 50.02 points for Whites and 49.98 points for Blacks, 
producing a White-Black difference (A) of 0.04 points. After the exchange, average 
contact with Whites falls to 50.0 points for Whites and rises to 50.00 points for 
Blacks, producing a White-Black difference (A) of 0.00 points. Thus, the exchange 
causes the average White-Black contact difference for the subset of affected house- 
holds (A) to fall, but only by 0.04 points. 

The larger reduction in A for S in the first exchange compared to the second 
exchange —3.96 points versus —0.04 points — highlights a property of S discussed 
above and noted previously by Zoloth (1976), James and Taeuber (1985), and 
Reardon and Firebaugh (2002). The property is that S responds more strongly to 
integration-promoting exchanges between areas that are more polarized in terms of 
area racial composition (i.e., exchanges when |p; — p; | is larger). The reduction in 
the first exchange is larger by 3.92 points than the reduction in the second exchange 
and in relative terms is 99 times larger. 

I view this as sensible and desirable. In substantive terms the first exchange has 
a larger impact on reducing group separation because it does more to “deconcen- 
trate” the group distributions across the two areas because it brings together White 
and Black households from areas that initially were at opposite extremes on area 
racial composition. 

The first exchange reduces White’s contact with Whites by a larger amount — 
1.98 points compared to 0.02 points — while simultaneously increasing Black’s con- 
tact with Whites by a larger amount — 1.98 points compared to 0.02 points. As a 
result, the first exchange reduces the White-Black difference in contact with Whites 
by a larger amount. In contrast, the second exchange has a small impact on reducing 
group separation because it brings people together from areas that initially were 
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minimally different on area racial composition. Accordingly, the exchange has less 
impact on group separation as measured by S because the affected White and Black 
households were already living together. 

The relative impact of these integration-producing exchanges on D can be 
assessed by calculating lambda (A) in the same manner as just performed for S. The 
only difference is that segregation-relevant contact with Whites (y) is scored differ- 
ently for D than for S. Specifically, contact with Whites (y) is scored 1 for “above 
parity” and 0 otherwise. For D, the relative impact of the exchanges on the value of 
D is the same for both of the exchanges. Specifically, the White-Black difference in 
average (scaled) contact with Whites for the affected households (A) is reduced by 
two points under both scenarios. The reason for this is that in both cases a single 
White household changes from being scored 1 for “above parity” to being scored 0 
for “not above parity.” Similarly in both cases only a single Black household changes 
from being scored 0 for “not above parity” to being scored 1 for “above parity”. The 
initial average on contact with Whites as measured by D is 100.0 for Whites and 0.0 
for Blacks, producing an average White-Black difference of 100.0 for the popula- 
tion in the affected neighborhoods. After the exchange, the average on contact with 
Whites as measured by D is 99.0 for Whites and 1.0 for Blacks resulting in a differ- 
ence of 98.0. The exchange thus reduces the value of A by 2.0 points. 

The second exchange also produces a reduction in the value of A of 2.0 points. In 
this case, average contact with Whites as measured by D is initially 51.0 for Whites 
and 49.0 for Blacks, producing an average White-Black difference of 2.0 for the 
population in the affected neighborhoods. After the exchange, the average on con- 
tact with Whites as measured by D is 0.0 for Whites and 0.0 for Blacks resulting in 
a difference of 0.0 since now no one in either group lives in an “above-parity” area. 
The exchange thus reduces the value of à by 2.0 points, a reduction identical to the 
amount in first exchange. 

The “flat” or “fixed” response of the relative impact of à on D can be seen as 
appropriate for the goal of assessing “displacement” conceived narrowly in terms of 
population fractions moving from one side of parity to the other. These fractions are 
the same for both exchange scenarios, so À is the same for both scenarios. The fact 
that the two exchanges in question have fundamentally different effects on group 
separation and area polarization is not relevant to the narrow conception of displace- 
ment embodied in D. 

The contrast of the flat response for D for these two exchanges and the variable 
response for S highlights how displacement and separation are distinct and can vary 
independently. This point is further established by considering the pair of integration- 
promoting exchanges summarized in the second panel of Fig. 7.2. The most impor- 
tant difference between this pair of exchanges and the pair summarized in the top 
panel is that both of the areas in the bottom panel are “below-parity” on area propor- 
tion White. 

The third exchange depicted involves one area (Area 1) with 49 White and 51 
Black households and a second area (Area 2) with 1 White and 99 Black house- 
holds. Both are “below-parity” areas. The integration-promoting exchange involved 
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moves a White household from Area | (higher p) to Area 2 (lower p) and a Black 
household from Area 2 (lower p) to Area | (higher p). Following the exchange Area 
1 changes to 48 White and 52 Black households and Area 2 changes to 2 White and 
98 Black households. 

This exchange involves two areas that differ substantially on racial composition 
with values on area proportion White of 49.0 and 1.0, respectively. The impact on S 
can be assessed as before by examining the value of A — the White-Black difference 
on segregation-relevant contact with Whites for the population in the affected areas. 
Initially, average contact with Whites as measured for S is 48.04 for Whites and 
17.32 for Blacks yielding a value of à of 30.72. After the exchange, average contact 
with Whites as measured for S is 46.16 for Whites and 17.95 for Blacks yielding a 
value of à of 2.21. Thus, under this exchange scenario the White-Black contact dif- 
ference (A) for the subset of affected households is reduced by 2.51 points. 

The fourth exchange involves two areas that together have the same number of 
White and Black households as in the two areas in the third exchange; 50 White and 
150 Black households, respectively. The initial distribution is less polarized than in 
the previous example. One area (Area 1) begins with 26 White and 74 Black house- 
holds and a second area (Area 2) that begins with 24 White and 76 Black house- 
holds. The integration-promoting exchange involves moving one White household 
from the area of higher p to the area with lower p and moving one Black moving 
from the area of lower p to the area of higher p. Following the exchange, Area | and 
Area 2 both change to having 25 White and 75 Black households. As with the third 
exchange, this exchange involves two areas that are “below-parity”. Here, however, 
the two areas initially are very similar on racial composition with area proportion 
White at 0.26 and 0.24, respectively. As a result, the White-Black contact difference 
(A) for the affected households changes by a very small amount. Initially, average 
contact with Whites as measured by S is 25.04 for Whites and 24.99 for Blacks 
yielding a value of à of 0.05. After the exchange, average contact with Whites as 
measured by S is 25.00 for Whites and 25.00 for Blacks yielding a value of à of 0.0. 
Thus, the exchange reduces the White-Black contact difference (A) for the subset of 
affected households by 0.05 points. 

There are two key findings. One is that both exchanges produce reductions in S 
whereas we will soon see that neither exchange produces a reduction in D. Another 
key finding is that the impact on reducing S is much larger in the third exchange 
than in the fourth exchange. The reduction in the third exchange is 2.46 points larger 
than in the fourth exchange and in relative terms is almost 50 times larger. Again, 
considered in relation to the goal of assessing whether groups live together or apart, 
it is substantively sensible that the third exchange has a bigger relative impact on S 
than the fourth exchange. As previously seen in the first exchange, the third exchange 
does more to “deconcentrate” the group distributions. The third exchange reduces 
White’s contact with Whites by a larger amount than the fourth exchange — 1.88 
points compared to 0.04 points, respectively — while simultaneously increasing 
Black’s contact with Whites by a larger amount — 0.63 points compared to 0.01 
points, respectively. 
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In substantive terms, the third exchange could be imagined to reflect a “middle- 
stage” integrating sequence where a pioneering Black household leaves a predomi- 
nantly Black neighborhood and moves to diverse (50/50) neighborhood and a 
gentrifying White household leaves a 50/50 area and moves to a predominantly 
Black area. In contrast, the fourth exchange is a small-impact integrating exchange. 
Like the second exchange reviewed earlier, the two areas involved are near-identi- 
cal in terms of racial composition before the exchange so on balance the households 
affected by the exchange experience minimal changes in neighborhood outcomes 
and very small reductions in pairwise same-group contact. 

The response of D in the third and fourth exchanges in the second panel is easy 
to summarize. D does not change in either case because all households in both 
groups reside in areas that are “below-parity” both before and after the exchanges. 
So again, D has a flat response of no change while S registers a decline in both 
exchanges. The response by S varies from the response by D in two ways. First, S 
responds to both integrating moves while D does not. Second, S responds more 
strongly to the third exchange which clearly reduces group separation and area 
racial polarization by a larger amount. 

One implication from the comparison of how S and D are affected by these two 
exchanges is readily obvious. It is that the value of D can remain fixed while the 
value of S can run higher or lower depending on whether integrating moves involv- 
ing areas that are not above parity reduce polarization or whether segregating moves 
increase polarization. I discuss this more carefully in the next section. 


7.3.3 Implications of Analysis of Example Exchanges 


A couple of important implications follow from these examples of how D and S 
respond to exchanges. I start first with integration-promoting exchanges where both 
areas involved in the exchange are on opposite sides of P. In these exchanges D has 
a “flat” response to all integration-promoting exchanges; its value declines by the 
same amount in all cases. In contrast, S will respond more strongly when the 
exchange is between more polarized areas (and therefore more distant from parity) 
and S will respond less strongly when the exchange is between areas of similar 
racial composition (and therefore closer to parity). This leads to the following 
conclusion. 


Values of D and S can be similar or they can diverge depending on whether displacement 
from uneven distribution arises from segregation-promoting exchanges that produce maxi- 
mally polarized areas (higher S, closer to D) or minimally polarized areas (lower S, further 
from D) on opposite sides of P. 


The second important implication concerns integration-promoting exchanges 
where both areas involved in the exchange are on the same side of P. D again has a 
“flat” response. It does not change. In contrast, S will always respond and S will 
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respond more strongly when the exchange is between more polarized areas and S 
will respond less strongly when the exchange is between areas of similar racial 
composition. This leads to the following conclusions. 


Values of D and S can be similar or they can diverge depending on whether displacement 
from uneven distribution arises from segregation-promoting exchanges that produce maxi- 
mally polarized areas (higher S, closer to D) or minimally polarized areas (lower S, further 
from D) on the same side of P. 


Stated another way, S will take higher values when the population residing in non- 
parity areas is concentrated to form racially polarized areas and S will take lower 
values when the population residing in non-parity areas is dispersed widely to form 
areas that are similar on racial composition instead of being polarized. 

The practical consequence for D-S comparisons is this. At a given level of dis- 
placement as measured by the dissimilarity index (D), the value of the separation 
index (S) can vary independently and by substantial amounts depending on whether 
group distributions both between “above-parity” areas and “other” areas and within 
“non-parity” areas tend toward maximum area racial polarization or minimum area 
racial polarization. The former concentrates both groups in homogeneous areas and 
maximizes same-group contact and group separation. The latter disperses both 
groups across less homogeneous areas and minimizes same-group contact and 
group separation. 

Ultimately, as I show below, this leads to the following conclusion about the 
relationship between D and S. At a given level of displacement (D), the value of the 
separation index (S) can vary substantially depending on whether group distribu- 
tions within “non-parity” areas tend toward concentration or dispersion. When con- 
centration within non-parity areas is at its maximum, the value of S will equal the 
value of D. But when concentration is at its minimum - that is, when groups are 
maximally dispersed across non-parity areas, the value of S will be lower, some- 
times much lower, than the value of D. 

Intuitively, one can get to these two alternative outcomes via simple steps as fol- 
lows. At a given level of displacement, implement as many segregation-promoting 
exchanges as possible within non-parity areas. If such exchanges can be made, 
group residential distributions will shift toward the pattern of “prototypical segrega- 
tion” and the value of S will increase. The value of D will not change so the D-S 
disparity will decrease. Ultimately, the value of S will rise until it reaches the value 
of D and D-S disparity will be zero. 

Alternatively, implement as many integration-promoting exchanges as possible 
within non-parity areas. If such exchanges can be made, group residential distribu- 
tions will shift toward the pattern of “dispersed displacement” and the value of S 
will decrease. The value of D will not change so D-S disparity will increase. 
Ultimately, S will fall until it reaches its minimum possible level and the D-S 
disparity reaches its maximum. At the conclusion of the process, S will take a value 
substantially below the value of D. 
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7.4 Clarifying the Potential for D-S Concordance 
and Discordance — Analytic Models 


I further clarify the potential for both D-S agreement and disagreement by review- 
ing a series of analytic exercises that illustrate how group separation and area polar- 
ization (S) can vary independently from the level of displacement as measured by 
dissimilarity (D) while holding city-level racial composition (P) constant. To keep 
the exercises simple and easier to follow, I limit the hypothetical city to only three 
kinds of neighborhoods designated as Areas 1, 2, and 3 with the following 
characteristics. 


Area, is “Above parity” (i.e., disproportionately White with p, >P and q, <Q) 

Area, is at “Parity” (i.e., exactly average on proportion White with p, =P and 
q, =Q) 

Area; is “Below parity” (i.e., disproportionately Black with p, <P and q, >Q) 


The model can be extended to allow for more variation in area racial composition, 
but this provides no benefit for present purposes. 

I first note that, at a given level of displacement from even distribution as regis- 
tered by D, S will take its maximum value of S= D when the population residing 
in non-parity areas is maximally concentrated. This occurs when non-parity areas 
are either all-White or all-Black and thus are perfectly “polarized” as either 1.0 or 
0.0 on area proportion White (p;). This result can be produced by a “Maximum 
Concentration” or “Maximum S” algorithm involving three steps as follows. 


1. Set the share of Whites in Area, to D (i.e., sy, = w, /W = D). Proportion White 
in the area will be 1.0 (1.e., p, =1.0). 

2. Set the share of Blacks in Area; to D (i.e., Sẹ, =b, /B =D ). Proportion White 
will be 0.0 (1.e., p, = 0.0). 

3. Place remaining Whites and Blacks in Area). Area share scores for Whites and 
Blacks will be s,,, = Sẹ: = (1 E D) and proportion White for the area will be at 
parity G.e., p, =P). 


The resulting group distributions will produce a distinctive “four-point” segregation 
curve that Duncan and Duncan (1955: Figure 5) termed a “William’s model” segre- 
gation curve. In this distribution, S takes its maximum possible value (Smax) under 
the prevailing level of displacement from even distribution with Swa =D. 

This establishes that logical upper bound on separation (S) is the level of dis- 
placement (D). In addition, since D can vary independently of racial composition 
(P) and S can always match D, this result also establishes that group separation (S) 
can vary independently of city racial composition (P). This finding lays to rest any 
claim that the value of S is inherently dependent on city racial composition. S can 
match D when displacement is concentrated. Whether displacement is concentrated 
or not depends on sociological dynamics governing population distribution, not the 
inherent nature of S. 
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The next issue to take up is whether D and S can vary independently. This is rela- 
tively easy to establish as S will take a lower value than D when groups residing in 
non-parity areas are dispersed rather than concentrated.? When groups residing in 
non-parity areas are concentrated, higher values of S result and in the situation of 
complete concentration S reaches a maximum value of D. When groups residing in 
non-parity areas are dispersed widely, values of S will be substantially lower than 
values of D. When groups are exactly equal in size, a relatively uncommon but logi- 
cally possible situation, the value of S can fall to at least D*. In cases where groups 
are unequal in size, values of S can fall well below D? and in some circumstances S 
can potentially fall to very low values.'° 

A variety of algorithms will produce patterns of dispersed displacement from 
even distribution that yield low values of S while maintaining a specified value of 
D. In a more detailed discussion of this issue (Fossett 2015) I review a progression 
of algorithms. For present purposes, I introduce an algorithm that produces the low- 
est levels of S I have been able to obtain under the three-area scenario under discus- 
sion. This “Minimum S” (Smin) algorithm actually uses just two areas, one area that 
is “above parity” and one that is “below parity”. 

The algorithm to obtain Smin involves two variations which I term here Model A1 
and Model A2. Each version will produce the lower value of S over some ranges of 
city racial composition (P) as follows. 


if (P>0.5) Surai z Syin < Swin a2 


if (P=0.5) Sumai = Sun =S 
if (P<0.5) Sumai 2Smin =S 


Min Min A2 


Min Al Min T “Min A2 


Accordingly, one can obtain the value of Smin by assigning the value of S generated 
by Model Al when P 2 0.5 and the value of S generated by Model A2 when P < 0.5 


Both versions of the algorithm proceed to an intermediate step with one homo- 
geneous area and one mixed area. The Al version of the algorithm begins as 
follows. 


Al Step 1. For Area,, set the group share for Whites (Sw,) to D and the group share 
for Blacks (sg) to 0.0. 

Al Step 2. For Areas, set the group share of Whites (sw3) to 1—D and the group 
share for Blacks (sg;3) to 1.0. 


This produces an “above-parity” area (Area,) that is all-White and a “below-parity” 
area (Area;) that is mixed White and Black. 
Similarly, the A2 version of the algorithm begins as follows. 


°The exceptions are when D is close to boundary values of 0 and 1.0. Under these conditions, 
scores for all popular measures of uneven distribution will agree. 

10] offer this conclusion based on exercising the models discussed here over the full “event space” 
of possible combinations of D and P. In all instances where 0 <D <1, I obtained values of S 
below the value of D’. 
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A2 Step 1. For Area, set the group share for Whites (sw,) to 1.0 and the group share 
for Blacks (sg,) to 1—D. 

A2 Step 2. For Areas, set the group share for Whites (sw3) to 0.0 and the group share 
for Blacks (sg;3) to D. 


This produces an “above-parity” area (Area,) that is mixed White and Black and a 
“below-parity” area (Area;) is all-Black. 

For most logically possible combinations of D and P the value of S can be 
reduced even further by transferring an optimal amount (X) of equal shares of 
Whites and Blacks from the “mixed” area to the homogeneous area to reduce con- 
centration (increase dispersion). For Model A1 these transfers move equal group 
shares (X) of Whites and Blacks from Area3, which is mixed, to Area,, which is 
all-White. For Model A2, these transfers move equal group shares (X) of Whites 
and Blacks from Area, which is mixed, to Area3, which is all-Black. 

The value of D is unaffected when equal group shares are transferred from one 
area to another. But the transfers can have substantial impacts on the value of S. A 
wide range of alternative group transfer share values are logically possible subject 
to the restriction that the transfers cannot produce area group share values below 0.0 
or above 1.0. The task is to find the optimal value (X) that will reduce the value of 
S to Smin, the lowest possible value under the three-area model under consideration. 
One strategy is to conduct a numerical search over the feasible values of X. I devel- 
oped algorithms that implemented this approach and used them to establish bench- 
marks for what is possible. Using this approach I found I could obtain the same 
result for Smin regardless of whether starting from the residential distributions cre- 
ated at the intermediate steps of Model Al or Model A2. 

With additional exploration I discovered that the same residential distributions 
and resulting value of Smin can be obtained using a direct analytic solution. This 
solution involves modifying the transfer of equal group shares so it brings the share 
of total population (i.e., Whites and Blacks combined) in “above-parity” and 
“below-parity” areas as close to 0.5 as possible. This is accomplished as follows. 
First, identify the range of logically possible share transfer values (X) that will 
maintain the value of D. These will range from a minimum of 0.0 to a maximum of 
(1-D). Next calculate the value of |s, — 0.51, the unsigned difference between the 
total population share in Area; and 0.5. Tentatively adopt this as the share amount 
(X) to be transferred. If the value of X is larger than the maximum feasible value 
(1-D), set the group transfer share value (X) to (1 —D). In other words, set X to the 
minimum of |s, —0.51 and (1—D). Next implement the transfer of the identified 
group share amounts (X) from the mixed area to the homogeneous area. 

Thus, when P 2 0.5, use the Al algorithm with these additional steps. 


Al Step 3. Set the optimal share (X) of Whites and Blacks to transfer from the 
mixed area (Area;) to the all-White area (Area,) as the minimum of Isy, — 0.51 
and (1-D). 

Al Step 4. Implement the transfer, thus increasing sw; to D+ X and sg; to X and 
reducing sw; to 1-D-X and sg; to 1-X. 
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When P <0.5, use the A2 algorithm with these additional steps. 


A2 Step 3. Set the optimal share (X) of Whites and Blacks to transfer from the 
mixed area (Area,) to the all-Black area (Area;) as the minimum of Is, — 0.51 
and (1-D). 

A2 Step 4. Implement the transfer, thus reducing sw; to 1—X and sg; to 1-D-X and 
increasing Sw; to X and sg; to D+X. 


7.4.1 Examples of Calculating Values of Smin Given Values 
of D and P 


Figure 7.3 provides a summary listing of formulas for calculating terms relating to 
group residential distributions under the “Maximum S” and “Minimum S” analytic 
models just introduced. I establish the basis for the formulas in a more detailed 
review of analytic models for group separation (Fossett 2015). The formulas in 
Fig. 7.3 establish how, in the context of the three-area analytic model considered 
here, algorithms for dispersed and concentrated displacement will generate group 
residential distributions producing lower and higher values of group separation (S) 
under a given combination of fixed values for displacement (D) and city racial com- 
position (P). As best I have been able to determine, the formulas in Fig. 7.3 establish 
the logically possible range for S under a given combination of D and P by yielding 
the minimum possible value for S (Smin) under dispersed displacement and the max- 
imum possible value for S (Smax) under concentrated displacement. 

In this section I review examples to illustrate how values of Smin and Smax can be 
calculated for a given combination of displacement (D) and city racial composition 
(P). The value of S under the “Maximum S” algorithm can be obtained by using the 
formulas in Fig. 7.3 to first establish the values of relevant component terms — area 
group share distributions (sw; and sgi) and area group proportions (p;) — used in com- 
puting formulas for S and then carry through the calculations to obtain S. 

Consideration of the two general computing formulas for S given below (as well 
as earlier) reveals that the “parity area” in the three-area analytic model under con- 
sideration can be ignored because calculations for this area yield values of zero (0) 
and have no impact on the value of S. 


S= 2s (p; -P)?/PQ, and 
S= È Swi Pi ~È Sgi * Pi- 


The value of S thus results from the calculations for the “above parity” and “below 
parity” areas and can be given as either 


S= (Sw; * Pi +Sw3 -P3 )— (Ssi “Pi + Sg3 -p;)s or 
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Area 1 Area 2 Area 3 
Above Parity Parity Below Parity 
(pi >P) (pi = P) (pi < P) 
Smax, S = D under Maximum Concentration Model 
White Share (swi) D 1-D --- 
Black Share (sgi) a 1-D D 
Total Share (sri) PD 1—-D QD 
Prop. White (pi) 1 P 0 
Smin a1, S under Dispersed Displacement Model A1 
White Share (swi) D+X --- 1—D-X 
Black Share (spi) X --- 1-X 
Total Share (sri) PD+X === 1—PD—-X 
Pi P(D+X)/(PD+X) --- P(1-D-X)/(1-PD-X) 


White Share (swi) 
Black Share (sgi) 
Total Share (sri) 


Prop. White (pi) 


Smin az, S under Dispersed Displacement Model A2 


1—X os X 
1-D-X --- D+X 
P+Q(1—D)—X ze QD+X 


P(1-X)/(1-QD-X) > PX/(QD+X) 


Fig. 7.3 Summary of formulas for group residential distributions by level of dissimilarity (D) and 
racial composition (P) under selected algorithms for producing concentrated and dispersed dis- 
placement from even distribution (Notes: Per discussion in text, X=min(Is7;—0.5l, (1 — D)) where 
Szr; is (1-—PD) under Model A1 and QD under Model A2, respectively) 


S= (sw = sp1 )P; + (Sw = Sp; P3- 


Taking the example combination of displacement as measured by the dissimilar- 
ity index (D) set to 60 and pairwise city proportion White (P) set to 0.90, the result- 
ing value of S under the “Maximum S” Model can be obtained by first establishing 
the values of relevant component terms and then carrying through the computations. 
The relevant component terms for the non-parity areas can be obtained as follows. 


Sy, =D = 0.60 
Sy; = 0.0 
Sg, = 0.0 
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(sy, —8,,) =D-0.0 =D = 0.60 
(sws —8p3) = 0.0- D = -D =-0.60 
p, =1.0 


p; = 9.0 


The following calculations now demonstrate that Sya = D. 


S= (sw; ‘Pi + Sw3 -P3 )— (Spi * Pi + Sg3 “p;) 
= (0.60-1.0 + 0.0-0.0)-(0.0-1.0+0.60-0.0) = 0.60 


S = (sy —8p,)P; +(Sw3 —Sp3) Ps = 0.60-1.0+—0.60-0.0 = 0.60 


This expression reveals something interesting and important. It is this. 


The value of P is not directly involved in the formulas for the component terms. This indi- 
cates that the value of Smax is unaffected by city racial composition. Accordingly, under 
concentrated displacement, S can equal D for any city racial composition. 


The calculations for “Minimum S” (Smin) under dispersed displacement are more 
involved. Model A1 applies when city racial composition (P) is 20.50 and thus 
would be the relevant model for most White-Minority comparisons in US cities. 
Model A1 also is relevant for the example just considered where D is 60 and pair- 
wise city proportion White (P) is 0.90. The value of S under Model A1 can be 
obtained by first establishing the values of relevant component terms and then car- 
rying through subsequent calculations. The relevant component terms can be 
obtained as follows. 


X = min(I(1-PD)-0.51,(1—D)) = min(I(1-0.90-0.60)-0.51,(1- 0.60) 
=min(I(1- an ate 0.5 1,0.40) = min (0.04,0.40) 
= 0.04 


wi =D+X = 0.60 + 0.04 = 0.64 


=(1-D-X)=1-0.60 -0.04 = 0.36 
s,, = X = 0.04 
s,, = 1- X =1-0.04 = 0.96 
(Sy —8p,) = (D+ X)-X =D=0.60 


(sw, —8g3) = (1-D-X)-(1-X) =-D = -0.60 
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p, = P(D+X)/(PD +X) =0.90(0.60 + 0.04) / (0.90 -0.60 + 0.04) 
= 0.576 / 0.58 = 0.9930 


p, =P(1-D-X)/(1- PD- X) = 0.90(1- 0.60- 0.04) /(1—0.90-0.60—0.04) 
= (0.90-0.36) / (1- 0.58) = 0.324 / 0.42 = 0.7714 


Note that (Sw; — Sp; ) resolves to D and (Sw, ~ Sp; ) resolves to —D. As a result, the 
expression 


S= (Swi — Spi )p, + (Sws E Sys) Ps 
can be restated in the following convenient computing formula. 
S=D(p,-p;) 


The following calculations illustrate that any of the three expressions can be used 
to obtain S,,,, = 0.1330 under Model Al. 
S= (Sw: ` Pi +Sy3 -P3 )— (Sp ‘Pi + Sg3 -p3 ) 
= (0.64 -0.9930 + 0.36 - 0.7714) — (0.04 -0.9930 + 0.96 -0.7714) 
= (0.6355 + 0.2777) E (0.0397 + 0.7405) = 0.9132 — 0.7802 = 0.1330 


S= (Swi — Spi JP; +(Sy3 — Sps )Ps = (0.64 - 0.04) 0.9930 + (0.36 — 0.96) 0.7714 
= 0.60 -0.9930 — 0.60- 0.7714 = 0.5958 — 0.4628 = 0.1330 


S= D- (p, —p;) = 0.60 - (0.9930 — 0.7714) = 0.60 -0.2216 = 0.1330 


Model A2 applies when the city racial composition (P) is < 0.50. Typically this 
model is not relevant for most White-Minority comparisons in US cities. But it is 
occasionally relevant, perhaps most often for White-Latino comparisons in the bor- 
der region of the southwestern United States. As with Model A1, the value of S can 
be obtained from any of the following three equivalent expressions. 


S= (Swi Pi + Sws ' P3 )— (Sp Pi +Sp3 P3) 


S= (sw = S1 )P; i = Sps ) P3 


Thus, for example, if D is 60 and pairwise city proportion White (P) is 0.30 (similar 
to the value of P for many White-Latino comparison in Texas border region cities) 
the resulting value of S under Model A2 can be obtained by first establishing the 
values of relevant component terms and then carrying through computations. The 
relevant component terms are as follows. 
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X = min(I(1-QD)-0.51,(1—D)) 
= min(I(1-0.70-0.60)—0.51,(1-0.60)) 
= min(I(1-0.42) -0.51,0.40) = min(10.58—0.51,0.40) = min (0.08 0.40) 
= 0.08 


sw, =1-X =1-0.08 = 0.92 
Sw, = X = 0.08 


s,, =1-D-X =1- 0.60-0.08 = 0.32 


Sp, = D +X = 0.60 + 0.08 = 0.68 


(Sw; —8g,) =(1-X)-(1-D-X) = (1-1) + D+(X-X)=D=0.60 


(sw; —8g3) = X-(D+X)=-D =-0.60 


p, = P(1-X)/(1-QD-X) 
= 0.30(1—0.08) / (1- 0.70-0.60 — 0.08) = 0.30 -0.92 /(1—0.42 — 0.08) 
= 0.276 / 0.50 = 0.552 


p, = PX / (QD + X) = 0.30 - 0.08 / (0.70-0.60 + 0.08) = 0.024 / 0.50 = 0.048 


The following calculations illustrate that Smin = 0.3024 under Model A2 can be 
obtained using any one of the following three expressions. 
S= (Sw ` Pi + Sw3 “Pp; )— (Sp *Pi + Sg3 “p;) 
= (0.92-0.552 + 0.08 - 0.048) — (0.32 - 0.552 + 0.68 - 0.048) 
= (0.5078 + 0.0038) — (0.1766 + 0.0326) = 0.5116 —0.2092) = 0.3024 


S= (Sw a Spi) Pi + (Seis T Sp ) P3 
= (0.92 — 0.32) 0.552 + (0.08 — 0.68) 0.048 
= (0.60)0.552 + (—0.60) 0.048 
= 0.3312 — 0.0288 = 0.3024 


S=D-(p,—p;) 
= 0.60- (0.552—0.0480) = 0.60-0.5040 = 0.3024 
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7.4.2 Examining D, Sma» and Smi, over Varying Combinations 
of D and P 


The models for obtaining maximum and minimum values of the separation index 
(S) just reviewed provide a basis for establishing the potential for D and S to vary 
across varying combinations of the level of displacement from even distribution as 
measured by the dissimilarity index (D) and the racial composition of the city (P). I 
used these models to compute values of Smax and Smin over possible combinations of 
D ranging from 0 to 100 with P ranging from 1 to 99. Results from these calcula- 
tions are depicted graphically in Figs. 7.4 and 7.5 which depict the upper and lower 
bounds of the relationship between D and S at selected values for city racial compo- 
sition (P). Figure 7.4 depicts the relationship by plotting values of Smax and Smin 
against values of D. Figure 7.5 depicts the relationship by plotting values of D 
against values of Smin- 

I comment first on the diagonal line on Fig. 7.4. This results from plotting values 
of Smax against the value of D over all values of D and all values of P. The diagonal 
documents that S will equal D at any combination of values for D and P when dis- 
placement from parity involves concentration of both groups in racially polarized 
areas wherein Whites in non-parity areas live apart from Blacks in areas that are 


100 


Separation Index 


Dissimilarity Index 


Fig. 7.4 Maximum and minimum values of the separation index (S) by values of the dissimilarity 
index (D) for selected values of city percent White (P) under a three-area analytic model (Notes: 
Maximum and minimum values of S under three-area analytic exercise. See text for disscussion of 
analytic model. Curves are plotted for values of percent White (P) of 50, 70, 80, 90, 95. and 98) 
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Dissimilarity Index 


0 10 20 30 40 50 60 70 80 90 100 
Separation Index 


Fig. 7.5 Maximum and minimum values of the dissimilarity index (D) by values of the separation 
index (S) for selected values of city percent White (P) under a three-area analytic model (Notes: 
Maximum and minimum values of S under three-area analytic exercise. See text for discussion of 
analytic model. Curves are plotted for values of percent White (P) of 50, 70, 80, 90, 95, and 98) 


all-White and Blacks in non-parity areas live apart from Whites in areas that are all- 
Black. The diagonal in the figure thus serves as a reference line indicating the maxi- 
mum degree to which groups can be residentially separated at a given level of 
displacement from even distribution. 

The graph in Fig. 7.4 also plots the values of Smin against the value of D over 
values of D ranging from 0 to 100 and at selected values of P ranging from 2 to 50. 
Note that it is not necessary to plot the same relationships for values of P above 50 
they are identical to the relationships already shown for values of 1—P already 
shown. Thus, for example, the curve obtained when P = 98 is identical to the curve 
obtained when P = 2. Importantly, all of the curves fall below the diagonal and thus 
visually depict the fact that S can take a lower value than D at any combination of 
values for D and P when group displacement from even distribution is dispersed in 
a way that maximizes group residential mixing instead of being concentrated in a 
way that maximizes group residential separation. The set of curves also makes it 
clear that the maximum possible difference between D and S is conditioned by city 
racial (P). This is visually indicated by the fact that different curves result for each 
value of P. 

The maximum possible size of the D-S difference is smallest when the two 
groups in the comparison are equal in size (i.e., P= Q = 0.5 ). Intuitively, this is 
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because the maximum departure of S from D occurs when one group is dispersed 
widely across areas where it is over-represented, thus resulting in small departures 
of p; from P in these areas. This is demographically more feasible when one group 
is small in comparison to the other and it is less feasible when groups are equal in 
size. Elsewhere I establish that the D- Smin relationship when groups are equal in 
size is S=D? (Fossett 2015). This relationship is reflected in the curve that is clos- 
est to the diagonal. This curve documents that the absolute and relative magnitude 
of the possible D-S difference can be substantial even when it is at its minimum. 
The D-S difference when groups are equal in size reaches a maximum of 25 points 
when D is 50 and it is 20 points or more when D is in the range 28-72. In relative 
terms, the value of S can be up to 20 % lower than the value of D when D is 80; up 
to 30 % lower when D is 70; up to 40 % lower when D is 60; up to 50 % lower when 
D is 50; and so on. 

The D - Smin curves plotted at selected values of P depart further from the diago- 
nal as the racial composition of the city becomes progressively more imbalanced. 
Since most White-Minority segregation comparisons in empirical studies involve 
groups that differ greatly in size, these curves are highly relevant. They document 
that potential D-S differences can be very large in both absolute and relative terms 
under combinations of D and P that are common in “real world” settings. When P is 
85, the D- Smin difference exceeds 25 when D is in the range of 30-93 and it exceeds 
40 when D is in the range of 56-83. In relative terms, the value of S can be up to 
50 % lower than the value D when D < 82 and 70% lower or more when D < 58. 
The potential D-S differences are even more dramatic when P is 95 or higher. For 
example, when P is 95, the D- Smin difference exceeds 25 when D is in the range of 
28-98 and it exceeds 40 when D is in the range of 44—96. In relative terms, the value 
of S can be up to 50 % lower than the value D when D < 94 and 70% lower or more 
when D<84. 

Importantly, group size differentials of this magnitude are common in empirical 
studies of segregation in US cities. For example, they are typical of White-Asian 
comparisons in most cities and they are typical of White-Latino comparisons in the 
“new destination” communities of the Midwest, South, and Northeast. The potential 
for D-S differences to be very large in these situations is clearly revealed in Fig. 7.4. 
The patterns seen here provide compelling evidence that the prevailing practice of 
examining only D in empirical studies of segregation should be reconsidered. The 
curves in the figure document that the level of group separation and area racial 
polarization as measured by S can vary widely across cities that are identical in 
terms of group displacement from even distribution (D) and relative group size (P). 

Figure 7.5 makes the same point but from the vantage point of the separation 
index (S) instead of the dissimilarity index (D). Here the diagonal depicts the values 
of D plotted by S when displacement from even distribution is maximally concen- 
trated (Smax). The curves in the figure depict the values of D plotted by S when dis- 
placement from even distribution is maximally dispersed. The implication of these 
curves is straightforward. If one is interested in group separation as measured by S, 
D is an unreliable indicator because D can take very high values when groups are 
not residentially separated. This occurs when group displacement from even 
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distribution is extensive but the group populations are dispersed across non-parity 
areas in a way that minimizes group concentration and maximizes group mixing and 
co-residence. 


7.4.3 Implications of Findings from Analytic Models for Smax 
and S Min 


The preceding discussion establishes that scores for D and S can differ depending 
on three factors. The first is whether displacement of groups from even distribution 
is present and is substantial. All else equal, the potential for D and S to differ is 
greatest when D is high (e.g., at or above 60) but less than its maximum of 100.!! 
The second factor is whether the group displacement from even distribution in ques- 
tion is concentrated or dispersed. When displacement is maximally concentrated, 
S= Sma =D; when displacement is maximally dispersed (minimally concen- 
trated), S= Snn < D? . The third factor is the relative sizes of the groups in the 
segregation comparison. All else equal, the maximum possible difference between 
D and S is larger when groups are unequal in size. Accordingly, the logical possibil- 
ity for a large D—S difference is greatest under the following conditions: (1) dis- 
placement from even distribution is extensive (i.e., D is high), (2) displacement is 
maximally dispersed, and (3) the groups are highly unequal in relative size (e.g., 
IP—QI>90). Analysis of empirical segregation patterns presented in Chap. 8 will 
document examples of such situations and establish that large D-S discrepancies are 
not just logically possible, they can and do occur with some regularity in empirical 
studies. 


7.5 Is Separation a Distinct Dimension of Segregation? 


I conclude this chapter by considering the issue of whether group separation and 
area racial polarization as measured by S should be viewed as a distinct dimension 
of segregation. Stearns and Logan (1986) argued that D and S tap different aspects 
of group differences in residential distribution and noted that D and S can differ 
both in overall value and in direction of change. On this basis they argued that S is 
a distinct dimension of segregation and should be routinely examined in empirical 
studies. The core of their position is that, unlike D, S registers whether or not groups 
live apart due to both groups being concentrated in homogeneous areas, a residential 
pattern of compelling substantive interest to researchers. 


1! When displacement reaches its maximum possible level, § = D = 100 and the D-S difference is 
necessarily zero. Similarly, if there is no displacement from even distribution, § = D =0 and the 
D-S difference is zero. 
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The view Stearns and Logan advocate runs counter to most methodological stud- 
ies which view S as one among many alternative measures of uneven distribution 
including the gini index (G), the dissimilarity index (D), the Theil entropy index 
(H), and the Atkinson index (A) represented here by the closely related Hutchens 
square root index (R) (Zoloth 1976; James and Taeuber 1985; White 1986; Reardon 
and Firebaugh 2002).'? These various alternative indices all differ from each other 
in at least the narrow sense that they can yield different numerical scores when 
applied to the same residential distributions. So the question arises, when does one 
measure become different enough from the alternatives that it should be considered 
a distinctive dimension of segregation? 

One basis for grouping indices together is similarity of computing formulas — the 
operational implementations of the conceptions of segregation embodied in the 
indices. On this basis one can argue that the separation index (S) is a measure of 
even distribution based on the close similarity of one of its computing formulas with 
a computing formula for the index of dissimilarity (D). 


D =100-(1/2TPQ)-=t, |p, —P| 


S=100-(1/TPQ)-=t,(p,-P) 


The view can also be supported by noting the close similarity of the following com- 
puting formulas for the separation index (S) and the Hutchens square root index (R) 
which empirically is closely related to D as well as to Atkinson’s A.” 


R=100-[1-(1/T)-2Jp,4, TPO | 


S=100-|1-(1/T)-=p,q,/PQ| 


Similarity of computing formulas for measures of uneven distribution also can be 
summarized in another, more abstract way. S is like G, D, R, and H, in that all of 
these indices can be described in the following way. The value of each of these 
indices registers the population weighted average of quantitative scoring of the 
deviations of area pairwise racial composition (p;) from the pairwise racial compo- 
sition of the city (P) overall, normalized to the range 0-1 where O indicates no 


12] make two qualifications. First, technically, Massey and Denton (1988) classified S as an expo- 
sure measure, but they noted others classify it as a measure of uneven distribution. Second, James 
and Taeuber (1985), Massey and Denton (1988), and White (1986) include the Atkinson index (A) 
as a measure of uneven distribution. But I instead list the Hutchens square root index (R) which is 
a superior and closely related substitute for the Atkinson index. 


3D and R both rank segregation comparisons in accord with the principle of segregation domi- 
nance. Using the data set for analyses reported in this chapter the simple linear correlation of D and 
R is extremely high (0.962) and the correlation is even higher when allowing for nonlinearity (the 
correlation of D with the square root of R is 0.984). 
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deviations and 1 indicates that deviations have reached the maximum possible 
result." 

Finally, S fares well when it is reviewed on non-controversial technical criteria 
suggested for measures of uneven distribution. Ironically, it fares much better than 
D, the most widely used measure of uneven distribution (Reardon and Firebaugh 
2002). 

From the points just reviewed there is a clear case for grouping S with other 
measures of uneven distribution. But there is room for further discussion on both 
conceptual and practical grounds. On conceptual grounds, the theory of segregation 
measurement can be described as “incomplete.” This means that the generally 
accepted criteria for evaluating measures of uneven distribution are compatible with 
a variety of measures each of which embodies a unique, albeit implicit, conception 
of uneven distribution. For now, however, the ambiguity of the situation is not likely 
to be eliminated. Some criteria for measuring even distribution such as the exchange 
principle discussed earlier in this chapter, have been endorsed widely (e.g., James 
and Taeuber 1985; White 1986; Reardon and Firebaugh 2002). But other criteria 
that would reduce ambiguity in measurement have been offered but not widely 
accepted. 

In particular, the criterion of “composition invariance” offered by James and 
Taeuber (1985) is seen as controversial so too is Taeuber and James’ (1982) criti- 
cism of the separation index (termed V in their discussion) based on related con- 
cerns. This principle has the practical consequence of requiring indices to order 
segregation comparisons in agreement with the principle of “segregation curve 
dominance.”'> Two widely used indices — the separation index (S) and the Theil 
entropy index (H) — do not satisfy this criterion. However, the criterion itself is con- 
troversial. Some have explicitly and forcefully rejected it (e.g., Coleman et al. 1982; 
White 1986). Others note the criterion has been suggested but do not endorse it 
(e.g., Reardon and Firebaugh 2002). The “revealed consensus” in the empirical lit- 
erature has been that researchers ignore the criteria and use H and S when they find 
these indices to be useful for meeting the needs of a their study.'° 

So where do things stand? If one accepts the principles of “composition invari- 
ance” and “segregation curve dominance” as integral and essential to the measure- 
ment of uneven distribution, the separation index (S) and also the Theil index (H) 
cannot be considered measures of uneven distribution. Under this circumstance, 


14 Thus, the index scores are normalized to the range 0-1 by dividing the average deviation scores 
by the maximum value the average can take under complete segregation. 

'S Even if this principle is accepted, segregation measurement theory is still technically incomplete 
because the principle is silent on how segregation comparisons should be ranked when segregation 
curves cross, as they sometimes do. This is less important on practical grounds as indices that 
satisfy the principle of segregation curve dominance tend to correlate at very high levels. 

16 Subordinating measurement principles to researcher needs is typical, not uncommon, as the most 
widely used index, D, does not satisfy the non-controversial principles of transfers and exchanges. 


7.5 Is Separation a Distinct Dimension of Segregation? 111 


Stearns and Logan (1986) would then be correct in arguing that S taps a distinct 
dimension of segregation." 

Personally, I am comfortable with this position. It would reduce ambiguity in the 
current relatively flexible notion of uneven distribution by distinguishing between 
indices that measure displacement and indices that measure separation. Displacement 
would be compatible with the geometric interpretation of the gini index (G) in rela- 
tion to the segregation curve and the closely related vertical distance and volume of 
movement interpretations of the dissimilarity index (D). Displacement also would 
be compatible with notions of group difference on rank-order position on area racial 
composition. G would then stand as an attractive index of displacement as it satis- 
fies the principle of exchanges and responds to all directional changes in rank-order 
differences between groups and thus supports interpretation as the “net difference” 
in group rank order advantage noted by Lieberson (1976). D would then stand as a 
crude version of G that may be useful due to its simplicity and ease of calculation 
even though it does not satisfy the principle of changes and responds only to direc- 
tional changes in rank-order distribution above and-below P. 

Two other measures — the symmetric version of Atkinson index (A) and the 
Hutchens square root index (R) — also could be categorized as measures of displace- 
ment. So far as I am aware, they do not offer the specific geometric interpretation of 
displacement that is available for G and D. But they are like G and D in satisfying 
the criterion of segregation curve dominance and their values correlate very closely 
with values of D and G in empirical analyses. Hutchens (2004) makes the case that 
R has attractive options for certain kinds of analysis based on being “additively 
decomposable” where G and D are not. 

Separation registers differences in group distribution that are not registered by 
displacement as measured by G, D, and R. Separation assesses group differences in 
quantitative position, instead of rank-order position, on area racial composition. A 
formal distinction can be made between displacement and separation by adopting a 
“polarization” criterion to supplement the “exchange criterion.” The current 
exchange criterion is minimal; it requires only that an index register an integration- 
promoting exchange. A polarization criterion supplement would additionally 
require the following. 


All else equal, exchanges involving more polarized areas and resulting in larger average 
reductions in same-group contact should have greater impact on an index than exchanges 
involving less polarized areas and resulting in smaller average reductions in same-group 
contact. 


Specifically, the principle would require the impact of the exchange on the index 
score to increase as the value of | p; — p; | increases. Thus, in the example exchanges 
discussed earlier in this chapter, S will respond more strongly to an exchange 


17 The issue is more complicated than my statement suggests. The dissimilarity index (D) does not 
satisfy the principle of transfers — a principle that does enjoy consensus support — yet methodologi- 
cal reviews typically characterize D as a valid measure of uneven distribution on the grounds that 
the practical consequences of violating the principle of transfers are not sufficient to justify disal- 
lowing the measure. 
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between highly polarized areas such as an exchange between one area with 100 
Whites and 0 Blacks and another area with 0 Whites and 100 Blacks and less 
strongly to an exchange between minimally polarized areas such as an exchange 
between one area with 51 Whites and 49 Blacks and another area with 49 Whites 
and 51 Blacks. In contrast, as demonstrated in examples reviewed earlier and estab- 
lished more carefully elsewhere (e.g., James and Taeuber 1985; Reardon and 
Firebaugh 2002), D will treat these exchanges as identical in impact. 

G responds in a more complicated way that ultimately is similar in nature to D. G 
will potentially treat these exchanges differently, but not based on the quantitative 
magnitude of the level polarization; that is, not in proportion to the value of |p; —p; |. 
Instead, since G assesses group differences in rank order position, it will treat these 
exchanges differently when they differ in terms of the share of the combined group 
populations residing in areas with values on racial composition (p) that fall in 
between the values on racial composition for the two areas involved in the exchange. 
Specifically, G would be reduced by a larger amount when the exchange causes the 
moving White and Black households to cross over a larger “intermediate” popula- 
tion; that is a larger share of the combined group populations residing in “intermedi- 
ate” areas where area proportion White (p) is larger than that for the area receiving 
the White household (p;) and smaller than that for the area sending the White house- 
hold (p;). This property of G has little practical consequence for overcoming insen- 
sitivity to polarization because, if the quantitative difference between the two areas 
(i.e., |p; —p, 1) is small, polarization is small and G, like D, can respond strongly to 
group differences in distribution across areas that are similar in terms of area racial 
composition. 

One potential benefit of adopting a strong conceptual distinction between dis- 
placement and separation is that it would reduce ambiguity in segregation measure- 
ment. It would make something clear both to researchers and also to consumers of 
segregation research. Namely, it would clarify that 


Segregation indices that rank segregation comparisons in terms of the segregation curve are 
poor choices for measuring group residential separation and area racial polarization. 


Similarly, it would signal that 


Segregation indices that measure group residential separation and area racial polarization 
are poor choices for measuring group displacement from even distribution. 


It appears, however, that prevailing practices in empirical research place greater 
priority on practical concerns such as flexibility and ease of use rather than confor- 
mity to technical measurement criteria. When approaching segregation guided by 
these priorities, which some might view as appropriate since key aspects of segrega- 
tion measurement theory are unresolved, one could argue that D and S both measure 
uneven distribution construed broadly. However, even when one adopts this view, it 
is important to recognize and acknowledge the following. 


D and S are sufficiently different in behavior that the choice between them has potentially 
important consequences for empirical findings that should not be overlooked. 
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Once this point is acknowledged, the responsibility falls to researchers to first deter- 
mine whether index choice matters for the findings obtained in any given empirical 
analysis and, when it does, to then report this and note the implications it may carry. 

All indices of uneven distribution register group differences in residential distri- 
butions differently. But some differences are negligible on practical grounds while 
others are potentially more important. In the case of D and S, the differences are 
especially likely to have important practical consequences for findings in empirical 
studies because they are at opposite ends of a continuum in how indices respond to 
group difference in distribution on area group proportion (p). Specifically, as a crude 
form of G, D is sensitive to rank order differences without regard to the quantitative 
magnitude of the differences involved while S is sensitive to quantitative differences 
that are large in size and is only weakly responsive to rank order differences that 
involve small quantitative differences.'* Understanding this difference helps clarify 
the nature of segregation patterns when D and S yield different results. Because D 
is sensitive to group differences in rank position on area group proportion (p), D can 
take high values even when the group differences on p are small in quantitative 
magnitude but are extensive. In contrast, S takes high values only when group dif- 
ferences on area group proportion (p) are quantitatively large, and will take low 
values when group differences in rank position on p are extensive but the quantita- 
tive differences involved are small. 

Whether one sees this practical difference between D and S as justifying the 
conclusion that they measure distinctly different dimensions of segregation is a mat- 
ter of judgment. I take the position that, at the very least, it is important to note that 
the two measures are similar in measuring group differences on area group propor- 
tions (p;) and give researchers the option of assigning priority to rank-order differ- 
ences or to quantitative differences. The choice between the two options is important 
because rank-order differences can be high even when groups live together in areas 
that differ by small amounts on area racial composition and quantitative differences 
can only be high when groups live apart in areas that differ substantially on area 
racial composition. 

Once this point is “on the table”, the choice between indices becomes sharply 
defined. If one adopts the separation index (S) one is choosing to focus on quantita- 
tive differences between group residential outcomes on racial composition and the 
question of whether groups live together or apart. When S takes high values, it also 
necessarily implies the presence of substantial differences in rank order position on 
racial composition as values of D cannot fall below values of S. This clearly fits well 
with prevailing, albeit usually implicit, notions regarding what I term “prototypical” 
segregation wherein rank-order and quantitative differences track each other closely. 
The contrasting possibility is when group differences in rank-order position on area 
racial composition (p;) are widespread, but they are small in magnitude resulting in 
a high-D, low-S outcome. This possibility of this outcome is not widely recognized. 


'SAs noted earlier, G registers group differences in rank position on area group proportion (p) 
regardless of the quantitative magnitude of the differences involved. D is a crude version of G and 
behaves in a similar way. 
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Perhaps because of this, compelling arguments for why one would prioritize this 
result over assessments of prototypical segregation have not been articulated in the 
literature. 

My own sense of the matter is that researchers should always examine S because 
it registers an aspect of residential distributions that is sociologically compelling 
and clearly relevant for the concerns that motivate researchers to assess uneven 
distribution in the first place. For example, Taeuber, a leading segregation researcher 
whose efforts popularized the use of the dissimilarity index, motivated one of his 
influential studies of White-Black differences in residential distribution by stating 
that “[r]esidential segregation of whites and nonwhites effects their separation in 
schools, hospitals, libraries, parks, stores, and other institutions” (1964:42; empha- 
sis added). 

The distinction between separation and “mere” displacement is important 
because residential separation is a logical prerequisite for groups to have fundamen- 
tally different neighborhood outcomes and life chances based on area of residence. 
To the extent that residential outcomes and life chances are liked to area of resi- 
dence, groups will tend to have similar residential outcomes and life chances when 
the two populations live together.!° All else equal, populations that reside together 
share the same physical and built environment whether despoiled and blighted or 
scenic and well kept; they likewise share the same neighborhood amenities such as 
roads, sidewalks, air and water quality; they have the same neighbors; they share the 
same neighborhood institutions, businesses, and public services; they have the same 
public schools; they have the same exposure to noise, crime and social problems; 
and so on. 

Alternatively, as Stearns and Logan pointed out, polarization of neighborhoods 
into White and minority areas makes minority households concentrated in minority 
areas vulnerable to discriminatory practices such as formal and informal redlining 
for loans and insurance coverage for homes and businesses that can undermine 
property values and inhibit private and public investment. Similarly, area racial 
polarization puts minority areas and minority households at risk of disadvantage in 
neighborhood outcomes resulting from differential siting of less desirable public 
institutions such as prisons, half-way houses, low-income housing developments, 
waste management facilities, etc., and similarly at risk for inequality in quantity 
and/or quality of schools, parks, libraries, government services, roads and other 
public infrastructure, and so on (Stearns and Logan 1986:127-128). 


Of course, residential outcomes and life chances can differ substantially for groups that live 
together when stratification processes are tied to group-membership independently of area of resi- 
dence. The Jim Crow South is an example where groups could live together at the neighborhood 
level but have fundamentally different life chances based on group membership. For example, in 
the extreme, Whites and Blacks living together in the same neighborhood — and sometimes even 
on the same block and residential property — went to different schools and used different public 
amenities such as water fountains, restrooms, and swimming pools. Even in this circumstance, 
however, many public goods aspects of neighborhoods — such as desirable amenities, roads, expo- 
sure to natural and man-made hazards, etc. — are shared equally when groups reside together. 
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In the ideal, research motivated by the kind of concerns just noted would assess 
group disparities on the relevant neighborhood characteristics directly. But, unfor- 
tunately, the requisite data are not available in comprehensive form. The next best 
option is to determine whether group residential separation creates the logical 
potential for disparities to be pronounced. The separation index (S) is directly rele- 
vant for this concern. Measures of displacement, D in particular, are not reliable 
substitutes. 
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Chapter 8 
Further Comments on Differences Between 
Displacement and Separation 


In Chap. 6 I documented that displacement (D) and separation (S) routinely diverge 
by large amounts in some empirical analyses. Then in Chap. 7 I provided technical 
discussions to clarify how D and S can vary independently. I also stressed that the 
combination of high-D, low-S — which occurs when displacement from uneven dis- 
tribution is dispersed rather than concentrated — has important sociological implica- 
tions and I advised researchers to check for this pattern and guard against incorrectly 
assuming that high levels of displacement (D) are accompanied by high levels of 
group separation (S). In this chapter I try to encourage researchers to follow this 
advice by discussing three topics relevant to measuring separation and understand- 
ing how it may diverge from displacement. 

I begin by revisiting the empirical relationship of D and S originally discussed in 
Chap. 6 and reviewing it in more detail in light of the material presented in Chap. 7. 
I then review plausible scenarios for how displacement can come to be dispersed in 
a way that creates large D-S differences. Discussions of this topic are not common 
in the literature. I address this gap to help researchers become more comfortable 
with giving attention to the contrast between dispersed and concentrated displace- 
ment from uneven distribution. I next focus on a practical issue researchers should 
bear in mind when seeking to measure and compare displacement and separation. I 
then conclude the chapter by noting an alternative option for measuring group sepa- 
ration and area racial polarization some researchers may find useful because it is 
easy to compute and explain and also tends to correlate closely with the separation 
index. 
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8.1 Revisiting the Empirical Relationships of Displacement 
(D) and Separation (S) 


I now examine empirical differences between D and S in more detail by revisiting 
the data on White-Minority residential segregation in core based statistical areas 
(CBSAs) for 1990, 2000, and 2010 originally discussed in Chap. 6. My goal in this 
discussion is to discuss D-S differences in light of perspective gained from the 
material presented in Chap. 7. Figure 8.1 plots scores for the separation index (S) by 
scores of the dissimilarity index (D) for CBSAs in 1990, 2000, and 2010. The plot 
includes 4,319 White-Minority segregation comparisons screened on having at least 
1,500 persons in both groups in the comparison. The diagonal reference line plots D 
against itself. The figure shows that in empirical application values of S are consis- 
tently lower than values of D. Logically, it is possible for the values of D and S to 
be equal in any comparison. But this occurs only when all group displacement from 
even distribution is concentrated in all-White or all-minority areas. It is readily 
apparent from the figure that even an approximation of this outcome is an uncom- 
mon occurrence for the cases in this data set. Figure 8.2 reverses the point of view 
for the relationship and plots scores for the dissimilarity index (D) by scores of the 
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Fig. 8.1 Separation index (S) by dissimilarity index (D) for White-Minority segregation compari- 
sons computed using block-level data for CBSAs in 1990, 2000, and 2010 (Reference lines: 
Diagonal for D by D and reference curves for 100 %, 75 %, and 50 % of D*”. 4,319 cases for White- 
Black, White-Latino, and White-Asian segregation comparisons with at least 1,500 persons in both 
groups) 
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Fig. 8.2 Dissimilarity Index (D) by separation Index (S) for White-Minority segregation compari- 
sons computed using block-Level data for CBSAs in 1990, 2000, and 2010 (Reference lines: 
Diagonal for S by S and reference curves for 100 %, 75 %, and 50 % of S*°. 4,319 cases for White- 
Black, White-Latino, and White-Asian segregation comparisons with at least 1,500 persons in both 
groups) 


separation index (S). Here the diagonal reference line plots S against itself. 
Unsurprisingly, the figure shows that values of D in this data set are consistently 
higher than values of S. The main benefit of this figure is to highlight how values of 
D can be misleading if one’s goal in measuring segregation is to identify prototypi- 
cal segregation involving group residential separation. 

The curved reference lines near the diagonal in each figure serve to highlight a 
“stylized fact” for D-S correspondence. It is the empirical regularity that, while it is 
logically possible for S to take a value equal to D in any comparison, values of S 
rarely exceed D*” in empirical analyses. Similarly, values of D rarely fall below S?®. 
In view of this empirical relationship, I characterize cities that fall along the interior 
boundary of the empirical D-S relationship depicted in Figs. 8.1 and 8.2 as cities 
where segregation follows a “prototypical” pattern. By this I mean that group dis- 
placement from even distribution registered by D is substantially concentrated and 
produces group residential separation registered by S. 

More specifically, I characterize segregation as clearly “prototypical” when 
scores for D and S track each other in parallel based on the mild nonlinear relation- 
ships of D = S?° and S = D?’ . Thus, for example, to characterize a city as having 
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General Categories D Range S Range 
Very High / Pronounced 75-100 65-100 
High / Substantial 60-74 45-64 

Medium / Moderate 45-59 30-44 

Low / Limited 20-44 10-29 

Very Low / Minimal 0-19 0-9 


Fig. 8.3 Guidelines for identifying prototypical segregation based on concordant scores for dis- 
similarity (D) and separation (S) when displacement from even distribution is substantially 
concentrated 


a prototypical pattern of segregation I would expect S to be near or above 65 when 
D is 75; or, conversely, I would expect D to be near or below 75 when S is about 65. 
The reference lines in the two figures reflect how values of D and S will correspond 
when “prototypical” segregation varies from low to medium to and high. For con- 
venience and consistent use of terms for describing the levels of segregation when 
displacement and separation are concordant, I offer guidelines in Fig. 8.3 for broad 
categories of prototypical segregation where dissimilarity (D) and separation (S) are 
concordant. When D and S align as they do in these broad categories, it is reason- 
able to describe displacement from even distribution as being substantially concen- 
trated such that groups are living apart, rather than together, roughly in keeping with 
the degree possible at the observed level of displacement from even distribution. 

In Fig. 8.4 I offer a more detailed set of guidelines for judging when D and S do 
not correspond as one would expect when displacement from even distribution is 
concentrated in the manner that produces a pattern of “prototypical segregation.” 
The first two columns list values of D and S that are “clearly concordant” meaning 
that the D-S combinations listed involve values of the separation index (S) that are 
in the higher range of what is possible given the level of displacement from even 
distribution indicated by the dissimilarity index (D). The quantitative guideline I 
apply for “clear concordance” of D and S is for the value of S to be equal to or 
higher than 95 % of D*”. The third column lists values of S that lead me to character- 
ize the D-S comparison as “Discordant” meaning that, instead of being substantially 
concentrated, displacement from even distribution is substantially dispersed and 
consequently produces a level of group separation that is well below that expected 
under prototypical segregation. The quantitative guideline I apply is that S is at or 
below 75 % of D*”. The fourth column lists values of S that lead me to characterize 
the D-S comparison as “Very Discordant” meaning that displacement from even 
distribution is highly dispersed and produces a level of group separation that is very 
low in comparison to that expected under prototypical segregation. The quantitative 
guideline I apply is that S is at or below 50% of D*”. 
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S Value is 
Concordant S Value is S Value is 
(Displacement is Discordant Highly Discordant 
Substantially (Displacementis (Displacement is 


D Value Concentrated) Dispersed) Highly Dispersed) 
D=90 $281 S< 60 $<35 
D=80 S = 68 S<50 S <28 
D=70 S256 S <40 <21 
D=60 S > 44 <31 <15 
D=50 5234 <23 <10 
D=40 S = 24 <15 S<5 
D=30 S= 16 8 --- 

D=20 s29 S<3 --- 


Fig. 8.4 Guidelines for assessing concordance-discordance of dissimilarity (D) and separation 
(S)* ('Concordant (displacement is substantially concentrated) with S295% of D*”; Discordant 
(displacement is dispersed) with S<75% of D*?; and highly discordant (displacement is highly 
dispersed) with §<50% of D*”) 


Figures 8.1 and 8.2 include reference lines that correspond to the quantitative 
guidelines just outlined. The figures thus document that many White-Minority com- 
parisons in these cities do have scores on D and S that place the cities in question 
comfortably within the category of having “prototypical segregation” wherein dis- 
placement from even distribution is accompanied by a correspondingly level of 
group separation and area racial polarization. At the same time, however, the figures 
also make it clear that a great many White-Minority comparisons in these cities 
have D-S combinations that are either discordant or very discordant indicating that 
segregation does not follow the “prototypical” pattern that researchers and broad 
audiences assume is typical. 

In individual cases of a particular White-Minority comparison in a given city, 
D-—S discrepancies can be discussed and evaluated in several ways including the 
following. 


e Comparing the simple D-—S difference 
e Expressing S as a percentage of D (i.e., 100-S/D) 
e Expressing S as a percentage of D?” (i.e., 100-S/D*” ) 


If the simple D—S difference is small, the situation involves concentrated displace- 
ment from even distribution that produces group separation at near the maximum 
level possible given the extent of group displacement. When the D-S difference is 
large, it is clear that the situation involves “dispersed displacement” that wherein 
group separation and neighborhood racial polarization are well below what is 
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possible given the level of displacement. That is, while the groups are differ sub- 
stantially in proportions residing in below-parity areas, they nevertheless tend to 
live together in neighborhoods that vary in a relatively narrow range on racial mix 
(p) and are not residentially separated into racially homogeneous neighborhoods. 

The relative comparison of D and S should be considered when the simple D-S 
difference is non-negligible, but not extreme. Expressing S as a percentage of D 
indicates the relative extent to which displacement from even distribution is concen- 
trated. If the value reaches 100, it indicates that group displacement is maximally 
concentrated in a way that produces non-parity neighborhoods that are racially 
homogeneous (all same-group) or nearly so. 

The relative comparison of S and D*” provides another reference point for assess- 
ing whether D and S are discordant. Values at 80% and above indicate that the 
values of D and S align in reasonable correspondence to what is expected when 
segregation follows a prototypical pattern at a levels characterized as low, medium, 
high, etc. as suggested above. This means that, at a given level of group displace- 
ment from even distribution (D), the degree of group residential separation (S) is in 
line with standard expectations. If the value drops below 75%, it signals a D-S 
discrepancy wherein at least one group’s displacement from even distribution is 
dispersed rather than concentrated. Values that fall below 50 % indicate that at least 
one group’s displacement from even distribution is highly dispersed and thus it not 
appropriate, and may even be substantially misleading, to characterize the two 
groups as living apart from each other. 

When focusing on individual cases in detail, these guidelines for “quick com- 
parisons” can be supplemented with detailed comparisons of group distributions on 
area racial composition. Elsewhere I provide a more extended review of graphical 
and quantitative analyses highlighting selected cases of White-Minority segregation 
that illustrate a variety of outcomes for D-S comparisons ranging from concordance 
(prototypical segregation) to very discordant (displacement without separation) in 
Fossett (2015). 


8.2 Scenarios for How D and S Discrepancies Can Arise 


Segregation researchers rarely comment on whether displacement measured by D 
involves group separation and neighborhood polarization measured by S. This is 
understandable because the issue is rarely discussed in either empirical studies or in 
the literature on segregation measurement. Accordingly, some might wonder if it is 
easy for D and S to differ in dramatic ways. In Chap. 6, I reviewed data showing that 
this is indeed the case empirically when the scope of segregation analysis is broad 
(i.e., expands beyond large metropolitan areas) and when samples include cities 
where minority populations are small in relative size. 

Given the lack of discussion of dispersed displacement and D-S divergence, it is 
understandable that consumers of segregation research and researchers themselves 
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may wonder “How can such patterns come about?” In this section I review some 
scenarios for how high-D, low-S situations can arise. My goal is to help readers gain 
a more intuitive understanding of how displacement can come to be extensive with- 
out also producing the high levels of group separation needed to create the pattern 
of prototypical segregation. 

To begin, imagine a city with 100 neighborhoods each of which has 1000 resi- 
dents. Additionally assume the city population is 98 % White and 2% Black with 
98,000 White residents and 2000 Black residents. Under exact even distribution all 
100 neighborhoods will have 980 White residents and 20 Black residents. This, of 
course, would be a pattern of “no segregation” and the values of D and S will both 
be zero (0.0). 

Now consider two alternative scenarios for how the same population could be 
rearranged to create a high level of uneven distribution. The first scenario pro- 
duces a pattern of “prototypical segregation” — displacement from even distribu- 
tion with substantial group separation and area racial polarization. It involves 
taking 49 of the 100 neighborhoods and exchanging the Black residents in these 
neighborhoods with White residents in one of the remaining 51 neighborhoods. 
This will leave 49 “above-parity” neighborhoods with 1000 Whites and no Blacks, 
50 “parity” neighborhoods with 980 Whites and 20 Blacks, and 1 “below-parity” 
neighborhood with no Whites and 1000 Blacks. The resulting value of D will be 
50 and the value of S also will be 50. The combination of S= D signals a residen- 
tial pattern of uneven distribution with the maximum polarization possible at this 
level of displacement. 

Note that the pattern is logically easy to create even though the Black population 
is small.! I will review empirical examples along these lines in a couple of case stud- 
ies considered below. The key feature of the situation is that the Black residents 
displaced into “below-parity” areas are concentrated in a small number of homoge- 
neous areas — a single area in this hypothetical case — creating the pattern associated 
with prototypical segregation. 

The second scenario I consider produces uneven distribution in the form “dis- 
placement without separation” or “dispersed displacement”. In this situation a 
larger fraction of the Black population lives in “below-parity” areas where Whites 
are under-represented (and Blacks are over-represented) but at the same time there 
is minimal group separation and no neighborhood polarization. This scenario 
involves taking 50 of the 100 initial neighborhoods and exchanging the Black resi- 
dents in these neighborhoods with White residents in the other 50 neighborhoods. 
In this case, however, the exchanges are implemented so no single neighborhood 
gains more than two new Black residents or loses more than two White residents. 
Implementing these exchanges will leave 50 “above-parity” neighborhoods with 


! All that is required is that the size of the minority population exceeds the size of the typical neigh- 
borhood. In this example, the size of the Black population (2000) is twice the size of the typical 
neighborhood (1000). 
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1000 Whites and no Blacks, and 50 “below-parity” neighborhoods with 960 Whites 
and 40 Blacks. 

In contrast to the first scenario, Black households displaced into “below-parity” 
areas are dispersed “thinly” across areas that are overwhelmingly White in terms of 
racial composition. As a result, displacement is extensive and affects half of the 
Black population but it does not produce group separation because it does not con- 
centrate displaced Black households in areas that are predominantly Black. The 
resulting value of D for this scenario will be 51 and the value of S will be 2. Note 
that D is high under this scenario and in fact is slightly higher than in the first sce- 
nario that produced prototypical segregation. In contrast, S is much lower and indi- 
cates extremely low group separation. The resulting combination of high-D, low-S 
indicates uneven distribution with extensive displacement but minimal group sepa- 
ration and residential polarization. 

Both scenarios of population residential distribution are simple and feasible 
demographically. However, if one assumes that Blacks are a minority population 
with little influence in the city’s political system, the sociological implications may 
vary markedly across the two scenarios. In the first scenario, half of Blacks reside 
in an all-Black ghetto or enclave. One can imagine that this makes them vulnerable 
to disadvantages in neighborhood conditions as neglect of the “Black” neighbor- 
hood by city administrators would have no adverse impacts on Whites. In the sec- 
ond scenario, all Blacks reside in neighborhoods that are 96 % White. While these 
areas are overwhelmingly White, they are technically “below parity” and contain a 
large share of the Black population. Accordingly, the residential patterns involved 
are fundamentally different from that produced in the first scenario. Black separa- 
tion from Whites and area racial polarization are essentially absent. As a result, one 
can imagine that Blacks are less vulnerable to disadvantages in neighborhood con- 
ditions because city administrators are unlikely to neglect “below-parity” neighbor- 
hoods where Blacks are “over-represented” because this would have adverse 
impacts on many more Whites than Blacks. Additionally, for neighborhood out- 
comes that are truly shared, Whites and Blacks would share a common fate and even 
if Black interests were not served well, they would be “protected” from harm when 
Whites interests are satisfied. 

“Fair enough” someone might say. But can one imagine “real world” sociologi- 
cal processes that would produce the two very different patterns of segregation asso- 
ciated with these two scenarios? Again the answer is yes. One example of a 
potentially plausible historical scenario is the case of White-Black segregation in 
northern cities before and after the Great Migration. Lieberson’s (1980, 1981) anal- 
yses of Black residential patterns 1890-1930 suggests that the relative numbers for 
Blacks in northern cities at the beginning of the time period were low and he specu- 
lates that due to the modest levels of Black presence Whites may not have perceived 
Blacks as a major threat to White residential areas. Accounts of the time suggest 
that, while Whites were hardly welcoming to Blacks, they did not yet engage widely 
in virulent anti-Black violence and other severe forms of discrimination that later 
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would become widespread. The pre-Great Migration setting thus afforded opportu- 
nity for wider dispersal of the Black population which Lieberson reports is indi- 
cated by low average scores for Black isolation in a set of 17 “leading non-southern 
cities” for which data are available. Lieberson’s analysis indicates that Blacks ini- 
tially resided disproportionately in “below parity areas” with moderate to high dis- 
placement but they did not at this time experience the high levels of concentration 
and isolation in ghettos that would later come to characterize Northern and 
Midwestern urban areas.” 

Lieberson then notes that the Black population grew rapidly in relative size in 
these cities as the Great Migration progressed in subsequent decades. White con- 
cerns about residential encroachment by Blacks increased and acts of anti-Black 
violence and both legal and informal housing discrimination against Blacks become 
more dramatic and more frequent. Increasingly, Blacks were driven from White 
residential areas and concentrated in predominantly Black areas that over time 
became large ghettos. With this, displacement as measured by D increased over this 
period. That is not surprising. What Lieberson points out as more intriguing is that 
Black isolation also increased at an even faster pace. More specifically and impor- 
tantly for this discussion, Black isolation in these cities increased at a pace well 
beyond that which would result from Black population growth alone. This is consis- 
tent with Blacks being increasingly disproportionately concentrated in predomi- 
nantly Black areas. By 1930 large ghettos were emerging across northern cities 
generally and familiar patterns of “prototypical segregation” came into being where 
previously they were not the norm. 

The account Lieberson builds by combing quantitative analysis of data on resi- 
dential distributions with historical information from the time period lays out a pro- 
cess of Black displacement from even distribution changing over time from being 
moderate and somewhat dispersed to being both more substantial and much more 
concentrated. This account is plausible and intriguing. But it also is quantitatively 
less than definitive because the analysis of residential patterns of the era is ham- 
pered by absence of data for small areas. Lieberson necessarily made use of data for 
larger areas such as “wards” in combination with historical accounts of relative 
dispersal of the Black population transitioning to concentration in ghettos. 

In light of this I give attention to some other examples that are quantitatively 
more definitive but less well known. The examples involve Latino migration to 
“new destination” communities in recent decades. Detailed analysis of block-level 
data over the period 1990-2010 shows that high-D, low-S patterns of dispersed 
displacement for White-Latino segregation are common in new destination com- 
munities and in many cases transition over time into high-D, high-S patterns of 
prototypical segregation (Fossett, Crowell, Saenz, and Zhang 2015). 

Several qualitative studies of Latino settlement in new destination communities 
including as examples a study of Garden City, Kansas (Broadway 1990), a study of 


Lieberson does not report values of the separation index. However, in the context of a near-binary 
White-Black city composition, overall isolation is a close proxy for pair-wise isolation. When it is 
low in comparison to its logical maximum of 1.0, as Lieberson reports, it implies that S also is low. 
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Marshalltown, Iowa (Grey and Woodrick 2006), and a study of Durham, North 
Carolina (Flippen and Parrado 2012) provide a basis for suggesting a plausible 
“composite” scenario of possible social dynamics underlying the quantitative pat- 
terns.* In this composite scenario Latino individuals and families initially migrate in 
small numbers drawn by economic opportunities. Since it is a new Latino destina- 
tion with minimal prior Latino presence, White-Latino ethnic relations are inchoate 
and not yet well-formed. Demographically, there are no pre-exiting barrios or Latino 
residential areas for Latino immigrants and migrants to settle in. The qualitative 
accounts noted above suggest that early arriving Latino families do not initially 
encounter strong, widespread discrimination in housing, possibly due to their small 
numbers and their novelty in the absence of established White-Latino relations. As 
a result, early-arriving Latino settlers tended to locate idiosyncratically following 
available affordable housing vacancies distributed across many neighborhoods. 
These early arriving Latino families and households did tend to live in “below- 
parity” areas. But, as confirmed by quantitative analysis of block-level data, they 
typically lived in areas that were predominantly White, often overwhelmingly 
White. Few Latinos at this time lived in predominantly Latino neighborhoods. 

This pattern produces a “classic” high-D, low-S index score combination associ- 
ated with the segregation pattern of high displacement without group separation and 
area racial polarization. Quantitatively, it is a fundamentally at odds with an alterna- 
tive and sociologically plausible scenario in which early arriving Latinos are con- 
centrated in rapidly forming barrio and enclave neighborhoods due to multiple 
causes including as two examples housing discrimination based on linguistic and 
cultural differences and dynamics ethnic congregation based on mutual-support and 
ethnically structured flows of information regarding housing opportunities. 

The key point to bear in mind that empirical studies that rely solely on D cannot 
differentiate between the two alternative scenarios. But the D-S comparison makes 
it possible to use data to sort the story out more carefully and the observed high-D, 
low-S outcomes are more consistent with the “dispersed displacement” scenario. 

Many new destinations continue to attract Latino migrants and experience steady, 
sometimes rapid, Latino population growth. As the Latino population grows, the 
White population often begins to take greater notice and becomes less tolerant of 
the presence of Latinos. Anti-immigrant and nativist sentiment increases and dis- 
crimination against Latinos in housing increases and constrains residential opportu- 
nities for Latino families and households. As Latino neighborhoods emerge, they 
may be attractive locations for settlement for later arriving Latino migrants, espe- 
cially those with limited English language skills. Such options were not available 
initially, of course, because the Latino presence was too limited. 

These complementary dynamics of increasing discrimination and immigrant 
congregation dynamics can serve to concentrate larger shares of the Latino popula- 
tion in predominantly Latino areas forming enclaves or barrios. As this transition 


$ Special thanks to Cassidy Castiglione, an undergraduate research assistant who helped identify 
these case-studies during her participation in an National Science Foundation Research Experiences 
for Undergraduates Summer Institute at Texas A&M University in summer 2015. 
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occurs, the pattern of segregation also undergoes a transition wherein S rises faster 
than D. Indeed, the value of D itself may remain relatively stable or may even fall. 
The reason for this is that displacement — that is, White-Latino differences in pro- 
portion residing in “above-parity” areas was already high. But the pattern of dis- 
placement is changing from being dispersed to being concentrated. Over the span of 
a few decades, the high-D, low-S pattern of dispersed displacement for Latinos may 
then shift to a high-D, high-S combination of “prototypical segregation.” The data 
reviewed in Fossett, Crowell, Saenz, and Zhang (2015) indicate that the quantitative 
trend just described can be seen across many Latino new destinations over the 
period 1990-2010. 

These are just two examples of how possible, and I argue plausible, scenarios for 
social dynamics and trends could potentially produce White-Minority uneven dis- 
tribution in the form of both “dispersed displacement without separation” and “con- 
centrated displacement” resulting in “prototypical segregation”. Accordingly, 
sociologists should be mindful of the possibilities and should consider systemati- 
cally examining segregation indices that can reveal the presence of these distinctive 
residential patterns. The easiest option for doing so is to examine both D and S and 
note when instances of D-S concordance and discordance are found. 


8.3 A Practical Issue When Comparing D and S - Size 
of Spatial Units 


Values of S and D can and sometimes do disagree. When the differences are large, 
the discrepancy will always be in a particular direction; D will be high and S will be 
low. This outcome is rich with sociological implications but its occurrence is rarely 
discussed. The example introduced earlier in which I contrasted median scores for 
White-Black segregation with White-Asian segregation illustrated this point. D was 
high for both group comparisons with scores of 72.1 and 64.6, respectively. In con- 
trast, S for White-Black segregation (46.4) was more than three times higher than S 
for White-Asian segregation (13.2). This result suggests something potentially 
important about the difference between White-Black segregation and White-Asian 
segregation. It is that consistently high levels of displacement from even distribution 
are evident in both comparisons, but group separation and residential polarization 
are present only in White-Black segregation. Uneven distribution for White-Asian 
segregation does not involve group separation and residential polarization. Instead, 
Asian displacement from parity on area proportion White (p) involves dispersed 
displacement with quantitatively small departures from parity. Consequently, 
Asians live alongside Whites and experience similar residential outcomes. Blacks 
experience similar extensiveness of displacement from parity on area proportion 
White (p), but the departures from parity are much larger quantitatively and as a 
result Blacks do not live alongside Whites and do not experience similar residential 
outcomes with regard to area proportion White (and presumably also with area 
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characteristics that are correlated with area proportion White). Based on this, it is 
reasonable to conclude that the potential for differences in life chances and other 
consequences to arise from segregation are much greater for Blacks than from 
Asians even though typical values on D are relatively close. 

This example along with the examples discussed in the preceding section of this 
chapter make a compelling case for the value of comparing S with D. However, I 
now caution that, before researchers finalize conclusions based on comparing D and 
S, they should take to review certain aspects of study design to make sure that the 
conclusions offered will be sound. The aspect of research design to review is the 
comparison of group size and the population size of the spatial units used to assess 
segregation. This aspect of research design is potentially important for both S and 
for D. But its consequences can be different for D and S and in some conditions can 
exaggerate D-S differences. 

It is of course well known that using larger spatial units will result in lower seg- 
regation scores for any index of uneven distribution. Conventional wisdom is that 
this is not generally a major concern so long as it is reasonable to assume that the 
effect is approximately constant across cases. In that situation, researchers will 
know that overall levels of segregation will be lower, but at the same time they can 
expect that comparisons across cities or for a given city over time will still reveal 
fundamental variations in patterns and trends over time. 

Unfortunately, it is not always reasonable to assume that the impact of areal unit 
choice is approximately constant across measures or across individual cases. One 
potentially serious problem can arise when spatial units used to measure segregation 
are large in relation to overall group size.’ It is that segregation index scores will be 
misleadingly biased down when smaller homogeneous regions are “hidden” within 
larger heterogeneous areas. The problem affects both D and S but not to the same 
degree. The previous chapter noted that S is sensitive to large differences in area 
racial composition that reflect area racial polarization and group residential separa- 
tion. But measurement of polarized differences is susceptible to being diminished 
when smaller homogeneous areas occur within larger units. This leads to lower 
values on D as well as S. But in this case, D is protected by its crudity as, whether 
due to true social dynamics or due to limitations of research design, reductions in 
area polarization only impact D when the associated changes cause one area to cross 
from one side of overall city racial composition (P) to the other. In essence, using 
areal units that are “too large” imposes an artifactual “ceiling” on scores for group 
separation and neighborhood polarization by pulling area-specific values on racial 
composition (p) toward the grand mean (P). 

The problem of underestimating segregation will be worse under at least two 
conditions. The first is when segregation is manifest at a relatively low spatial 
scale — for example, at the block level — and segregation also follows a pattern of 
small-scale “checkering” instead of large-scale clustering. In this situation the 
aggregation of smaller homogeneous units within larger heterogeneous units can 


‘The key issue is absolute group size, not relative size. However, the two often go hand in hand and 
so the issue often will be salient when relative group size is small. 
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reduce values of both D and S dramatically. Fortunately, the practical consequence 
is usually modest because segregation patterns in US cities are characterized more 
by large-scale clustering than by small-scale checkering. 

The second condition is when segregation patterns include homogeneous regions 
that are smaller than the areal units used in the study design. The practical conse- 
quence of this problem is greater when groups are small in size. Even when area 
polarization is substantial and homogeneous areas for a group are clustered, the 
value of S cannot reach its maximum value if the overall size of the smaller group 
does not comfortably exceed the population size of the areal units used to assess 
segregation. As noted above, the impact will be potentially important for both D and 
S, but more so for S. As a result, using large spatial units when investigating segre- 
gation involving small groups can distort comparisons of D and S making D-S dif- 
ferences appear larger than would be the case if a better research design was used. 

In light of this, researchers should give the issue careful thought when making 
decisions about research design. Happily, the problem is easy to understand and, 
once appreciated, major problems are easy to avoid. The solution is to confirm that 
the spatial units used to assess segregation have the logical capacity to capture 
group separation and residential polarization for the groups in the comparison. 

Brief discussion of a hypothetical example can illustrate the key issues. Assume 
a hypothetical city with 4 equal size census tracts each containing 4000 people. 
Also assume that each tract is subdivided into 4 equal size block groups (for a total 
of 16 block groups) each containing 1000 people. Next assume that the city has two 
groups, one with 15,000 people and one with 1000 people, and then assume that 
everyone in the smaller group resides in a single block group. Finally, assume that 
each block group is divided into 10 equal size blocks each containing 100 people. 

In this example, S and D will both register perfect segregation (D = S = 100.0) 
if their values are computed using block data or block group data. However, if they 
are computed using tract data their values will be 80.0 for D and 20.0 for S. This 
contrast illustrates two points. The first is that both displacement (D) and separation 
(S) can be measured without error if the spatial unit used in the research is “right 
sized” as it is in this example when using blocks and block groups. 

The second point is that when the spatial unit used is too large — meaning specifi- 
cally that the population of the smaller group is too small to fill multiple areas, as is 
the case when using tracts in this example — the value of all indices of uneven dis- 
tribution will be underestimated. Furthermore, while both D and S will be underes- 
timated based on this problem with research design, the impact will tend to be more 
dramatic for S for reasons give above. This in turn can distort the comparison of D 
and S. In the worst case scenario, it would produce an incorrect impression that a 
high D, low S situation of “dispersed displacement” or “displacement without sepa- 
ration” prevails when a better research design would reveal a high-D, high-S com- 
bination indicating a pattern of “prototypical segregation”. 

A simple practice can guard against the problem; avoid using spatial units that 
are too large to reveal group separation and neighborhood polarization involving 
small groups. A practical rule of thumb is that typical population size for spatial 
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units should be one-third to one-fifth the total size of the smaller group. Alternatively, 
group size should be 3-5 times larger than typical area population size. When this 
condition is met, it will be possible to detect group separation and neighborhood 
polarization when it is present. However, if the spatial units are too large — that is, if 
their typical population size approaches or is larger than the size of the smaller 
group, it will be impossible to fully “see” group separation and residential polariza- 
tion when it is present. 


8.3.1 A Case Study of White-Black Segregation Cullman 
County Alabama 


I now review a real world example that illustrates both the problem and the solution. 
The case is White-Black segregation in Cullman county Alabama, which constitutes 
the Cullman, Alabama core-based statistical area (CBSA). In 2000 the county popu- 
lation included 73,940 Whites and 726 Blacks with Blacks comprising less than 1 % 
of the population. A New York Times article (Dawidoff 2010) reports that Black resi- 
dents of the area describe the county as having a racist history including vigorous 
KKK activities and a hostile attitude toward Blacks in the Jim Crow era and beyond 
as exemplified by the fact of “sundown town” signs being posted in Cullman, the 
largest urban center of the county, well into the 1970s.° Historically, this caused 
Blacks to be excluded from the city of Cullman and the demographic legacy remains 
evident in contemporary residential distributions for the county. As of 2000, a 
majority of the Black population residing in the county lived in or near the small 
city of Colony, an outlying hamlet traditionally known as a “safe haven” for Blacks 
located in the hilly countryside to the south of Cullman, which was originally set- 
tled by former slaves who received land during Reconstruction following the Civil 
War (Kaetz 2013; Dawidoff 2010). 

The social history of the county explains why Blacks are few in number in the 
local population and it provides a basis for expecting that the small Black popula- 
tion present would be residentially separated from Whites. This is in fact the case. 
But it is crucial to use “right sized” spatial units to “see” this pattern in a quantitative 
analysis of White-Black segregation. Group separation and residential polarization 
is readily evident in analysis using data for census blocks ( S = 62.6 ). But it is less 
evident in analysis using data for census block groups (S = 21.0 ) and it is not evi- 
dent at all using data for census tracts (S = 5.8 ). In comparison, values of D do not 
differ so dramatically by type of spatial unit. The progression for D is 94.2 for 
blocks, 82.6 for block groups, and 73.8 for tracts. Values for both S and D are lower 
when using tracts instead of blocks. But the difference between block- and tract- 
based scores for D is modest in comparison to the same difference observed for 


*Loewen (2005) study of “Sundown” towns discusses Cullman and many other cases and notes 
that sundown signs proclaimed messages such as “Nigger Don’t Let the Sun Go Down on You in 
This Town” and were common place in Alabama and many other states of the South and Midwest. 


8.3 A Practical Issue When Comparing D and S — Size of Spatial Units 131 


S. The progression in D-S difference is from 31.6 for blocks, 61.6 for block groups, 
and 68.0 for tracts. Recalling guidelines for D-S comparison offered in earlier chap- 
ters, the comparison based on block data indicates high-D, high-S and “prototypical 
segregation” based on a pattern for concentrated displacement from even distribu- 
tion. In contrast, the comparison based on tract data suggests high-D, low-S consis- 
tent with a pattern of “dispersed displacement” or “displacement without 
separation”. 

The explanation for these results is simple; the typical population sizes of census 
tracts and even census block groups are too large to detect White-Black residential 
separation in a situation where the Black population is small. The typical tract in 
Cullman County has a population of approximately 4,000 so, even if all Blacks in 
the county lived in a single tract, they would live in a predominantly White tract. In 
contrast, the typical block in Cullman County has approximately 24-28 people 
(similar to block data for other communities around the country) and thus block- 
level analysis has the logical capacity to easily detect White-Black separation and 
residential polarization if it is present. And it definitely is. Out of 2,449 populated 
blocks in Cullman County in 2000, a subset of twelve (12) blocks that were at least 
75 % Black (and with at least 10 residents) contained over 370 Blacks, over half of 
the Black population in the county. GIS-based mapping of population distribution 
for the Cullman CBSA (not reviewed here) reveals that these 12 blocks are located 
in a cluster of contiguous blocks in and around the hamlet of Colony. The high value 
of the separation index (S = 62.6 ) computed from block data registers this pattern 
of group separation and residential polarization clearly and unambiguously. Its 
interpretation is simple, straightforward, and sociologically meaningful. Whites and 
Blacks in Cullman County are residentially separated from each other and members 
of both groups primarily live in racially polarized neighborhoods where their own 
group predominates. 

The lesson from this case is that tracts can be too large to detect White-Black 
residential separation even when the size of the Black population exceeds the size 
of the typical tract. This problem can occur under at least two conditions. One is 
when segregation involves “checkering” occurring at a spatial scale smaller than the 
tract. Checkering could occur for example when multiple small predominantly 
Black neighborhoods arise in different parts of the city. Extreme clustering would 
occur when predominantly black neighborhoods are contiguous and form a single 
Black ghetto. Analysis using block level data will detect segregation in both cases. 
Analysis using tract data will detect segregation only in the second case. 

A second condition can further complicate the situation. It is when tract boundar- 
ies do not coincide with the perimeters of clusters of homogenous subareas (e.g., 
blocks). Census guidelines call for tract boundaries to follow social homogeneity in 
population distribution when feasible. But even at time of original “founding” 
boundary alignment may not be perfect because other competing concerns (e.g., 
tract population size, features of the natural and built environment, political bound- 
aries, etc.) also must be taken into account. Even when boundaries initially delimit 
homogeneous populations, this can change over decades based on dynamics of 
neighborhood change and population redistribution. Analysis using block level data 
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will be minimally affected by this problem because of their small spatial and popu- 
lation size. Analysis using tract level data can be affected in non-negligible degree, 
especially when minority population size is small. 


8.3.2 A Case Study of White-Minority Segregation 
in Palacios TX 


Palacios Texas is a small city found in the southwest corner of Matagorda County 
which constitutes the Bay City Texas CBSA. The case of Palacios is interesting 
because it is characterized by a segregation pattern not seen frequently in empirical 
studies — a high-D, high-S combination for White-Asian segregation in a commu- 
nity with a relatively small Asian population. Before proceeding, I first pause to 
make the case that it is reasonable to examine the city of Palacios separately from 
the rest of the Bay City CBSA. Palacios is a small spatially isolated coastal com- 
munity located on Matagorda Bay some 28 miles away from the larger, inland com- 
munity of Bay City. Significantly, Palacios and the nearby region is home to 
approximately 16% of the total population in the CBSA but about 79% of the 
CBSA’s Asian population.® The counts by group for Palacios are 2,895 Latinos, 
2,236 Whites, 706 Asians, and 239 Blacks. 

The D-S combinations for all White-Minority segregation comparisons in 
Palacios follow patterns of “prototypical segregation.” The White-Black segregation 
comparison involves a high-D, high-S combination (D=79.6, S=50.1) and White- 
Latino comparison involves a medium-D, medium-S combination (D=54.9, S =39.9). 
These are not particularly unusual for the region. What is unusual is that in Palacios 
White-Asian segregation also is characterized by a high-D, high-S combination 
(D=75.3, S=64.2) that is rarely seen for White-Asian comparisons. 

Close review of the residential pattern by GIS analysis and also with an in-person, 
on-site visit confirms what the quantitative analysis suggests; namely, White-Asian 
segregation in Palacios follows a prototypical pattern of extensive displacement 
from even distribution that is highly concentrated resulting in a high level of group 
separation and neighborhood racial polarization. GIS analysis confirmed by on-site 
review of contemporary residential patterns combined with review of historical 
materials reveals that the Asian population in Palacios has for at least three decades 
been concentrated in a small set of six adjoining blocks that are home to a thriving 
Vietnamese community that came into existence in 1976-1983 as a result of a refu- 
gee settlement program.’ 

This example provides further evidence that segregation patterns can span a wide 
range of logically possible outcomes in terms of D-S combinations and that valu- 


° Population counts and other analyses reported below are based on the census tract in Matagorda 
County Texas that contains all block groups overlapping with or adjoining Palacios. 

This discussion draws on an article “A Shrimp Tale” by Robert Draper in the October 1996 issue 
of Texas Monthly magazine which recounts the history of Vietnamese settlement in Palacios and 
its reception by and impact on the local community. 
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able insights can be gained by examining both displacement (D) and separation (S). 
In the case of Palacios TX, the unusual high-D, high-S combination for White- 
Asian segregation prompted a closer inspection. This in turn revealed an interesting 
community history with social dynamics that serve to produce and perpetuate a 
pattern of White-Asian segregation that is quite different from that seen in most 
communities. In particular, White-Asian is highest of all White-Minority compari- 
sons and much higher than the White-Latino comparison and closer qualitative 
review confirms that the quantitative finding of high-D, high-S identifies a city with 
a unique history of ethnic relations and residential segregation. 

This example also provides further evidence that the concern that values of the 
separation index (S) will necessarily be low when groups are small is clearly 
unfounded. The comparisons of D and S for Palacios, Texas show that these indices 
can reveal much about segregation of small groups in small communities so long as 
the research design uses spatial units that are appropriate for the research setting. In 
this case that requires using block data. When using block data interesting and var- 
ied patterns of segregation are revealed by contrasting values of D and S across 
White-Minority comparisons. GIS analysis of group residential distributions and 
in-person, on-site inspection of the residential patterns confirms the patterns sug- 
gested by the D-S contrasts. 

Indeed, the unusually high level of group separation in the White-Asian compari- 
son is both obvious and quite striking when one is “on the ground” in Palacios. But 
due to the small size of the group populations involved, all of these patterns would 
be missed if segregation were assessed using tract-level data or even block group- 
level data. A single tract includes all of Palacios and also the surrounding area so 
tract-level analysis is infeasible. The tract containing Palacios is comprised of six 
block groups so block-group analysis is technically possible. But it would be highly 
misleading. In 2000 the tract containing Palacios had 237 populated blocks. A small 
cluster of six (6) contiguous blocks located on the northern side of the city forms a 
Vietnamese enclave easily identified by GIS analysis and on-site inspection. The six 
blocks contained over half (50.7 %) of the Asian population in the Palacios area and 
had a population of 41 (10.3 %) non-Asians and 358 (89.7 %) Asians. The enclave 
cannot be identified using block group data because it is located in a block group 
where the other blocks (not in the enclave) have a population of 888 (98.1 %) non- 
Asians and 17 (1.9%) Asians. Accordingly, computing D and S using block group 
data yields values of 26.7 for D and 6.3 for S and equally low values for the other 
White-Black and White-Latino comparisons as well. 


8.3.3 Reiterating the Importance of Using “Right-Sized” 
Spatial Units 


The takeaway point from these two quantitative case studies is that it is important to 
use “right-sized” spatial units when assessing residential segregation and particu- 
larly when using S to assess group separation and residential polarization for groups 
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that are small in size. The good news is that S will reliably detect residential separa- 
tion between two groups so long as the spatial units used in the research design are 
appropriate for the analysis. In the cases just examined, block data readily revealed 
patterns of segregation even when some of the groups in the segregation compari- 
sons were very small in overall population size. 

Block data were once widely used in segregation analysis including most notably 
the landmark study by Taeuber and Taeuber (1965) and dozens of studies that used 
and supplemented these measures (e.g., Schnore and Evenson 1966; Roof 1972; 
Roof and Van Valey 1972; Sorenson et al. 1975). But in recent decades, with only 
occasional exceptions such as Lichter and colleagues (2010) and Allen and Turner 
(2012), segregation studies have relied primarily on tract-level data. The examples 
reviewed above highlight how the practice of using larger spatial units such as tracts 
and even block groups can limit the potential scope of segregation studies by creat- 
ing problems for assessing residential separation between groups when one group is 
small. This sometimes is mistakenly viewed as a problem inherent in the indices 
themselves. Indeed, some have raised concerns that the separation index will “nec- 
essarily” yield low values when segregation involves small groups. The examples 
just reviewed show this view is mistaken on two counts. First, to the extent that there 
is a problem, it is not limited to the separation index; it applies to all popular indices 
of uneven distribution. Second, the problem is not inherent in the indices; the prob- 
lem is with basic features of research design in failing to use spatial units appropri- 
ate for obtaining valid assessments of segregation. 

The analyses just reviewed demonstrate that both D and S can yield misleading low 
values when computed using tract-level and block group-level data but will correctly 
signal the presence of substantial segregation when computed using block-level data. 
This suggests that studies should use block-level data to guard against the problem. But 
as noted above this practice has become uncommon. The prevailing use of tract-level 
data is partly due to the fact that census tabulations for tracts provide more detailed 
breakdowns of population groups. But another important factor is that methodological 
studies have noted problems that can arise when measuring segregation using small 
spatial units. Taeuber and Taeuber’s thorough discussion of issues in segregation mea- 
surement (1965: Appendix 1) noted one reason. It is that it can be difficult or even 
impossible to achieve even distribution with small areas and small groups because 
populations are distributed in indivisible, whole number “clumps” associated with indi- 
viduals, families, and households, not fractional parts, and this makes it difficult to 
exactly reproduce city-wide racial proportions in small areas. Winship (1977) pointed 
out a second reason that has been seen as more important. It is that indices measuring 
uneven distribution are inherently susceptible to undesirable, non-negligible upward 
bias when segregation is assessed using small spatial units. 

The potential undesirable impact of both factors is more consequential for D than 
for S. But it is an important concern and, accordingly, I review it at length in analy- 
ses I present in Chaps. 14, 15 and 16. I save the details of that discussion for later. 
For now, I note that the new methods introduced in this monograph make it possible 
to identify the underlying basis for the problem of index bias and introduce new 
versions of popular indices that eliminate undesirable impact of bias while retaining 
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Factor for Ratio of Group 
Population Count to Area Population Size 


Type of Area 3 5 10 
Blocks @ 30 persons 90 150 300 
Block Groups @ 1,250 persons 4,750 6,250 12,500 
Tracts @ 4,000 persons 12,000 20,000 40,000 


Fig. 8.5 General guidelines for group population thresholds needed to assess displacement and 
group separation and area racial polarization 


other desirable index properties such as familiar substantive interpretations. Based 
on this, I have no reservations in advising researchers to use data for smaller spatial 
units when investigating segregation involving small groups. Concerns about index 
bias when using block-level data can be readily addressed using methods outlined 
in this monograph.® 


8.3.4 More Practical Guidance for Using S 


The discussion to this point raises the concern that all aspects of segregation in gen- 
eral and group separation and residential polarization in particular may not always 
be assessed accurately in studies that investigate segregation involving small groups 
using tract data. Earlier I suggested a “rule of thumb” that the size of the smaller 
group in the analysis should be 3-5 times the size of the areal units used to assess 
segregation. This informal guideline provides a basis for diagnosing the situation 
and considering alternative options for research design. I summarize the implica- 
tions of this guideline for studies using blocks, block groups, and tracts in Fig. 8.5. 
Note that the guidelines do not focus on relative size per se. That is appropriate 
because for this issue relative size is not the true source of the problem. The guide- 
lines instead focus instead on group population counts and indicate that to be “safe” 
the population size of both groups in the comparison should be at least 3-5 times the 
typical population size for the areal unit used. In addition, I have added an even 
more conservative factor of 10 to 1 and then have listed the associated group size 
“thresholds” for being able to “safely” analysis of displacement from even distribu- 
tion and group separation and residential polarization when using data for blocks, 
block groups, and tracts: 


8 The results for the examples of block-level analysis discussed in this chapter are for “standard” 
versions of D and S, not the “unbiased” versions I introduce in Chap. 15. In these particular cases, 
the issue of bias does not distort the findings presented. So I use standard versions of D and S to 
avoid introducing unnecessary complication to the discussion here. 
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These calculations make it clear that fairly large city group counts are needed to 
reliably assess displacement from uneven distribution and group separation and area 
racial polarization with tract-level data. The “safe” threshold ranges from 12,000 to 
40,000 depending on whether one chooses a liberal (3:1) or conservative (10:1) 
ratio of group population size to typical area population size. Studies using census 
tract data often include cases where the size of the smaller group in the comparison 
falls below these thresholds, especially the conservative threshold. This raises ques- 
tions as to whether assessments of displacement from even distribution and group 
separation and area racial polarization have been equally reliable across all cases in 
past studies using tract-level data. The basis for concern is not as great when segre- 
gation is measured using data for block groups because the thresholds for group size 
requirements are lower. The basis for concern is smaller still when segregation is 
assessed using block data because the thresholds for group size requirements are 
very small. This indicates that using block data is the safe way to go on this aspect 
of research design. 


8.4 A Simple Index of Polarization 


I conclude this chapter with a brief discussion of an alternative option for measuring 
group separation and area racial polarization. I offer the alternative because I recog- 
nize that D is popular in part because it is easy to compute and explain. In my opin- 
ion, S also is attractive on these counts and compares favorably with D, especially 
when both indices are presented in the difference of means formulation. But I also 
recognize that it others may it useful to have an alternative measure of separation 
when even greater simplicity is a priority. I suggest such a measure here terming it 
the “Polarization” index. 

The index is constructed as follows. First, for both groups, calculate the percent- 
age in each group that resides in areas where their group predominates based on a 
user-chosen “threshold” or “cut-point” such as 65 % same-group presence (POL¢;). 
To illustrate using Cullman County, I first calculate the percentage of Whites that 
reside in areas that are 65 % White and I then calculate the percentage Blacks that 
reside in areas that are 65 % Black. The results show that 99.9 % of Whites lived in 
predominantly White areas and 61.8% of Blacks lived in predominantly Black 
areas. The value of the polarization index is set to the lower of these two values and 
so POL¢; is 61.8. 

The logic for this measure is as follows. If the residential distributions of the two 
groups are polarized, the percentage residing in predominantly same-group neigh- 
borhoods must be high for both groups. If the distributions are not polarized, the 
number will be low for at least one of the two groups. So taking the minimum of the 
two values can serve as a simple “polarization” index. In addition to being easy to 
compute, the score of 61.8 for White-Black polarization (at 65%) in Cullman 
County is easy to interpret; it indicates that at least 61.8 % of both groups reside in 
neighborhoods where their group predominates (at a level of 65 % or higher). 
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The main benefit of this measure is that it may be useful for helping broad audi- 
ences gain an immediate intuitive grasp of group separation and neighborhood 
polarization. I have conducted detailed analyses (not reported here) of the behavior 
of this simple index of separation and polarization and I find that the measure can 
be highly serviceable. It ranks cases in a consistent way over different user-choices 
for the threshold for same-group presence and its values typically track the separa- 
tion index (S) fairly closely. For example, when using threshold levels for group 
predominance over the range of 55-75 %, the index values for Cullman fall between 
53.3 and 62.5 and thus are roughly comparable to the value of S at 61.8.° 

Consistency between S and POL also is seen when considering a broader range 
of cases. For the large, multi-year CBSA data set introduced earlier in Chap. 6, the 
correlations among “‘cut point” polarization indices using thresholds set at 5 point 
increments over the range 55-80 % ranged from 0.93 and 0.94. Of course, while 
these correlations are very high, they are not perfect. That is to be expected because 
S registers separation and polarization across the full spectrum of area racial com- 
position, not just in relation to a single threshold value. The trade-off then is between 
precision of measurement (S) and easy of discussion and presentation (“cut point” 
polarization indices). 
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Chapter 9 
Unifying Micro-level and Macro-level 
Analyses of Segregation 


Casting segregation indices in the difference of means framework provides a valu- 
able option previously not available to researchers. It enables them to seamlessly 
connect macro-level segregation — as measured by the index score for a city — to 
micro-level processes of residential attainment. At the simplest level the value of 
any index placed in the difference of means framework can be obtained by perform- 
ing an individual-level attainment analysis that predicts index-relevant residential 
outcomes (y, scored from area group proportion p) for individuals with a dummy 
variable (0,1) for racial group membership. The regression coefficient for race will 
exactly equal the index score obtained by standard computing formulas. This intro- 
duces a new interpretation of segregation index scores; their values reflect the effect 
of race on the attainment of residential outcomes that determine the segregation 
index score for the city. 

Establishing the equivalence of between macro-level measures of segregation 
and the effect of race on residential attainments in a bivariate individual-level 
regression model paves the way for at least three important new options for segrega- 
tion analysis. The first is to give researchers the ability to extend and elaborate 
bivariate models to investigate segregation in more detail using multivariate analy- 
ses. These models make it possible for researchers to address fundamental questions 
that previously could not be directly investigated. For example, researchers can 
assess whether or not the impact of race on segregation-determining residential out- 
comes seen in the bivariate analysis continues to persist when controls are intro- 
duced for other relevant individual- and household-level social characteristics (e.g., 
age, education, income, marital status, household composition, nativity, etc.) that 
may exert independent influence on residential outcomes. 

A second new option for segregation analysis is to give researchers the opportu- 
nity to quantitatively dissect the underpinnings of segregation in more detail than 
has previously been possible. Specifically, researchers can use familiar tools of stan- 
dardization and decomposition analysis to assess how the index score for a city is 
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quantitatively linked to group differences in the resources each group brings to the 
residential attainment process and to group differences in the parameters of the 
attainment process where resources (inputs) are converted to residential outcomes. 
Thus, one can develop improved answers to questions such as “Does segregation 
arise primarily because groups differ on income and other resources that affect resi- 
dential contact with the reference group?” Or, “Does segregation arise primarily 
because groups differ with respect to their ability to convert income and other 
resources into residential contact with the reference group?” Or, “Do both factors 
play an important role in creating segregation?” Questions of this sort have been 
raised for many decades. But answers have been unsatisfactory because the avail- 
able options for addressing the question have been crude and difficult to implement. 
The difference of means formulation provides new and superior options for devel- 
oping answers to these long-standing questions. 

A third new option for segregation analysis is for researchers to investigate 
cross-area and over-time variation in segregation in more detail using multi-level 
specifications of bivariate and multivariate segregation attainment models. 
Segregation attainment models are individual-level attainment models that predict 
the residential outcomes that exactly determine the level of segregation in a city. 
Multi-level specifications of the basic bivariate segregation attainment model enable 
researchers to investigate ecological variation in segregation by assessing how seg- 
regation — equated in this approach to the effect of race on segregation-determining 
residential outcomes — varies over time and across different cities depending on the 
time period and characteristics of the metropolitan area such as its size, rate of 
growth, industrial and occupational structure, unemployment rate, military pres- 
ence, etc. 

Multi-level specifications of individual-level, multivariate segregation attain- 
ment models make it possible to investigate these patterns in more detail and sophis- 
tication than ever before. Importantly, these models provide a superior approach for 
taking account of the role of non-racial social characteristics in shaping variation in 
segregation over time and across areas. Researchers routinely hypothesize that 
group differences on income, nativity, and other social characteristics may play a 
role in explaining cross area variation in segregation. Currently these hypotheses are 
assessed with aggregate-level models in which measures such as group income 
ratios, or percent foreign born for Latinos are used to predict segregation index 
scores for cities. The difference of means framework and the associated new option 
of analyzing segregation via attainment models make it clear that this long-standing 
practice is fundamentally flawed and should be discontinued. 

Current practice carries risks of erroneous inference associated with the so-called 
“ecological fallacy” — the fallacy of using aggregate indicators to assess or control 
for the effects of variables that operate at the micro level. Researchers have relied on 
the aggregate-level approach to address these important questions because until 
now they did not have better options for analysis. Multi-level implementations of 
multivariate segregation attainment models now allow researchers to properly take 
account of variables that affect segregation-determining outcomes at the micro level 
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(e.g., income, nativity, English language ability, etc.) when investigating cross-area 
and cross-time variation in segregation. 

The difference of means framework makes these three new options for segrega- 
tion analysis possible. I discuss the first two in more detail in the remainder of this 
chapter. I provide a detailed discussion of the third option in Chap. 10. 


9.1 New Ways to Work with Detailed Summary File 
Tabulations 


To begin I illustrate how the difference of means formulation makes it possible for 
researchers to investigate segregation in new ways by revisiting and expanding on 
the analysis of White-Minority segregation in Houston, Texas reported earlier in 
Chap. 5 (Tables 5.1, 5.2 and 5.3). The summary file tabulations underpinning these 
analyses provide more than just simple counts of families by race for census block 
groups. The tabulations also provide counts of families by poverty status, family 
type, and presence of related children separately by race.' The analysis of segrega- 
tion reported in Tables 5.1, 5.2 and 5.3 was simple and conventional. It assessed 
segregation in terms for race differences in residential outcomes without consider- 
ation for the role of the other social and economic characteristics available in the 
tabulation. There was no need to do so because index scores for the overall level of 
segregation between groups can be calculated using just group counts by race over 
areas. Accordingly, the scores reported in Tables 5.2 and 5.3 were obtained by col- 
lapsing the original detailed tabulations to obtain just the marginals for race. 

The difference of means formulation of segregation indices makes it possible to 
draw on the detailed information in the full tabulation to gain a deeper understand- 
ing of how overall segregation is related to group differences in distribution across 
poverty status and family type. It has always been recognized, at least implicitly, 
that segregation arises out of group differences in distribution on individual residen- 
tial attainments. And it also is widely recognized that residential attainments may 
vary, not only by race, but also with social characteristics such as age, gender, edu- 
cation, income, family status, and so on. Accordingly, researchers extending back at 
least to Duncan and Duncan (1955) have always wished for the option to take 
account of the possible role of social characteristics other than race when investigat- 
ing racial segregation. They have been frustrated in this goal, however, because until 
now the macro-level outcome of segregation could not be directly linked to 
individual-level residential outcomes in a way that would allow researchers to 
undertake the kinds of quantitative analyses needed to explore the issues with 
greater detail and sophistication. 


' Specifically, I draw on Tabulations P160 A-I of Census Summary File 3 of the 2000 Census. 
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Table 9.1 Descriptive statistics for poverty status and distribution of poverty status by family type 
for Whites, Black, Latinos, and Asians in Houston, Texas, 2000 


Variable Whites Blacks Latinos Asians 
Distribution of families by social characteristics 

Percent families not in poverty 95.8 80.9 80.2 90.4 

Percent families in poverty 4.2 19.2 19.8 9.6 

Percent families married couple 84.0 51.2 74.1 84.9 

Percent families with children 47.3 62.0 69.2 59.3 


Detailed distribution of families by poverty status and family type 
Families not in poverty by family type 


Married couple, no children 43.2 19.1 16.0 28.6 
Married couple, children 38.7 27.8 46.0 49.5 
Female headed, no children 3.9 7.9 2.9 3.4 
Female headed, children 6.1 19.9 7.4 3.7 
Other family type 4.0 6.2 7.9 5.3 
Families in poverty by family type 
Married couple, no children 11 1.6 1.6 2.2 
Married couple, children 1.1 2.7 10.6 4.6 
Female headed, no children 0.3 1.7 0.5 0.5 
Female headed, children 1.4 11.6 5.2 1.6 
Other family type 0.4 1.6 1.9 0.7 
100.2 100.0 100.0 100.1 
Sample N 627,613 195,928 294,931 55,746, 


Source: US Census 2000, Summary File 3 


The difference of means framework provides a solution to this problem. Casting 
segregation index scores as a group difference of means on residential outcomes for 
individuals opens the door for researchers to apply a standard toolkit of methods 
that are currently used to investigate race differences on education, income, poverty 
status, and other socioeconomic outcomes. Specifically, researchers now can ana- 
lyze segregation by combining individual-level attainment analysis with demo- 
graphic techniques of standardization and components analysis to better assess the 
roles that race and other social characteristics play in determining segregation. 


9.2 Some Preliminaries 


Tables 9.1 and 9.2 present the relevant descriptive data for the case of Houston, 
Texas. Table 9.1 documents that Whites, Blacks, Latinos, and Asians differ in their 
distribution across categories of family type and poverty status. Table 9.2 docu- 
ments how averages on the residential outcomes (y) that determine the separation 
index (S) vary across families grouped by family type, poverty status, and race. 
Table 9.3 similarly documents how averages on the residential outcomes (y) that 
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Table 9.2 Means on pairwise contact with Whites (y) scored for the separation index (S) by 
poverty status and family type for White-Minority comparisons, Houston, Texas, 2000 


Whites Minority group 
Family type Non-poverty Poverty Non-poverty Poverty 
i White-Black comparison 

Married couple, no children 89.9 87.4 34.2 19.7 
Married couple, children 91.3 86.0 41.3 29.4 
Female headed, no children 86.1 87.3 24.9 19.7 
Female headed, children 87.7 83.7 31.1 23.0 
Other family type 87.2 83.9 30.6 20.1 

All families 89.9 32:3 

Value of separation index (S) 57.4 

White-Latino comparison 

Married couple, no children 81.2 12.5 46.9 30.7 
Married couple, children 83.9 74.0 42.6 30.2 
Female headed, no children 73.3 70.7 38.4 313 
Female headed, children 77.6 71.9 41.6 31.6 
Other family type 75:1 66.9 36.2 28.2 

All families 81.1 40.2 

Value of separation index (S) 40.9 

White-Asian comparison 

Married couple, no children 93.9 94.5 71.2 61.8 
Married couple, children 93.9 94.0 71.7 62.3 
Female headed, no children 93.2 94.6 65.6 63.6 
Female headed, children 92.7 94.5 70.4 58.2 
Other family type 93.3 92.9 64.8 59.4 

All families 93.8 69.9 

Value of separation index (S) 23.9 


Source: US Census 2000, Summary File 3 


determine the dissimilarity index (D) vary across families grouped by family type, 
poverty status and race. In the difference of means framework the patterns in these 
three tables carry clear and direct implications for segregation. The overall segrega- 
tion index score for the group comparison is determined by the group difference of 
means on residential outcomes (y) and the mean for each racial group is in turn 
determined by the weighted average of the subgroup means for that racial group. 
From that vantage point the data presented in Tables 9.2 and 9.3 can be under- 
stood as providing a simple “ANOVA-style” micro-level attainment analysis of resi- 
dential segregation as measured by the separation index (S) and the dissimilarity 
index (D), respectively. The essence of the analysis is that individual families are 
cross-classified by the “independent variables” of race, family type, and poverty 
status and means on the “dependent variable” of scaled contact with Whites (y) are 
reported for the subgroups that are broken out in the cross tabulation. The overall 
group means reported in Table 9.2 in the rows labeled “All Families” reflect the 
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Table 9.3 Means on scaled pairwise contact with Whites (y) scored for the dissimilarity index (D) 
by poverty status and family type for White-Minority comparisons, Houston, Texas, 2000 


Whites Minority group 
Family type Non-poverty Poverty | Non-poverty Poverty 
i White-Black comparison 

Married couple, no children 87.5 82.2 20.2 8.8 
Married couple, children 90.6 80.6 24.2 14.5 
Female headed, no children 80.8 83.4 9.9 TA 
Female headed, children 84.6 77.8 13.5 8.8 
Other family type 82.5 74.9 14.5 71 

All families 87.7 16.8 
Dissimilarity index (D) 70.9 

White-Latino comparison 

Married couple, no children 81.5 67.6 32.9 13.3 
Married couple, children 86.2 68.9 24.9 12.5 
Female headed, no children 68.4 69.1 22.3 17.2 
Female headed, children 76.6 65.8 23.6 13.5 
Other family type 125 56.9 18.5 11.1 

All families 81.5 23.1 
Dissimilarity index (D) 58.4 

White-Asian comparison 

Married couple, no children 15:3 78.8 19.2 13.0 
Married couple, children 75.6 78.8 17.1 13.7 
Female headed, no children 73.8 84.7 16.4 26.9 
Female headed, children 70.7 78.6 15.7 12.1 
Other family type 73.7 77.6 10.3 8.7 

All families 75.2 16.9 
Dissimilarity index (D) 58.3 


Source: US Census 2000, Summary File 3 


weighted sum of the subgroup means by family type and poverty status based on the 
relative frequencies reported in Table 9.1. The difference between the two “overall” 
group means yields the index score for the comparison. Thus, the score for the sepa- 
ration index (S) for the White-Black comparison is 57.4 based on the difference 
between Whites having mean (pairwise) contact with Whites of 89.9 compared to 
32.5 for Blacks. Similarly, the score for the dissimilarity index (D) for the White- 
Black comparison is 70.9 based on the difference between Whites having a mean of 
87.7 on (scaled pairwise) contact with Whites compared to a mean of 16.8 for 
Blacks. 

It is not standard practice to analyze overall segregation index scores as arising 
from group differences in the distribution of individual families across subgroups 
with different average levels on residential outcomes (y) of scaled contact with 
Whites. In light of this I briefly review how the analysis presented in Tables 9.1, 9.2, 
and 9.3 can be performed using census summary tables. To begin, the data con- 
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tained in the block group-level census summary tabulation must be reconstituted as 
a micro-level data set for families. The first step is to recognize that the count for 
each “interior” cell in the full summary file tabulation represents a set of micro-level 
“cases” — families in this example — that have a particular configuration of social 
characteristics. The poverty status by family type summary file tabulation in ques- 
tion has eighteen (18) interior cells (note that tabulation marginals are excluded). 
The tabulation is repeated for all four racial groups yielding 72 separate “cases” 
(i.e., cells) for each block group. The final data set thus has one “record” for each 
interior cell in the summary file tabulation; that is a total of 72 separate records for 
each of the block groups in Houston. Each record has a unique combination for the 
characteristics of race, family type, and poverty status. The cell frequency indicates 
how many families with this unique combination of characteristics are found in 
each block group in the metropolitan area. 

Next a set of variables is coded for each of the records. The first variable is area 
of residence (i.e., the block group code). The second is “nfamilies” which is set to 
the value of the cell frequency for this case (i.e., the count of families in that cell of 
the tabulation). This will later be used as the frequency weight for the record when 
performing statistical calculations.” Next a series of additional variables are coded 
to represent the social characteristics of each family — namely, their race, family 
type, poverty status, etc. — in the table. Each characteristic is coded as a separate 
variable and assigned values as appropriate for the needs of the analysis. Each 
record in the resulting data set represents a set of families that reside in a particular 
block group and hold a specific combination of social characteristics. 

The variables that register social characteristics will serve as “independent” vari- 
ables in micro-level residential attainment analyses. They may be coded a variety of 
equivalent ways. I created dummy (0,1) variables for race to select records for 
Whites, Blacks, Latinos, or Asians as relevant. I also created a dummy variable for 
“poverty” and I similarly created a set of dummy variables to represent the five 
categories of family type. Finally, I also created additional dummy variables to cap- 
ture the possible interaction of poverty status and family type. Viewed from the 
perspective of analysis of variance (ANOVA) the set of dummy variables includes 
all combinations needed to estimate a “saturated” ANOVA model which includes 
all main effects and all possible interactions. 

The next step is to prepare a separate block group data set. The cases in this data 
set are block groups. The first variable for the case is the block group code which will 
be used for merging with the first micro-level data set. In addition, a set of variables 
are coded for the total counts of families by race; specifically, separate variables for 
the count of White, Black, Latino, and Asian families. Next compute a set of variables 


? Alternatively, one could create an individual-level data base by generating the relevant number of 
individual records for the families represented in each cell of the cross tabulation and assigning 
relevant codes for the social characteristics of the families as appropriate based. Of course, it is 
mathematically equivalent and computationally more efficient to use cells as cases and weight by 
cell frequency when performing analyses. However, the alternative approach can be used when 
statistical software cannot apply frequency weights for cases. 
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with the values of pairwise proportion White (p) for each of the three possible White- 
minority comparisons. These provide the basis for computing variables that score 
residential outcomes (y) from area (pairwise) proportion White (p) as relevant for 
different segregation indices. For example, in the case of the separation index (S), the 
relevant residential outcome (y) is the value of p. In the case of the dissimilarity index 
(D) the relevant residential outcome is the value of either 1 or 0 depending on whether 
area proportion White (p) is greater than proportion White for the city (P) or not. The 
resulting block group-level data set will then contain variables that will serve as 
dependent variables in micro-level segregation attainment analyses. 

The final analysis data set is created by merging the second data with the first 
data set based on the common block group code. The resulting data set can then be 
used to perform micro-level statistical analyses to analyze residential segregation. 

I followed the procedures just described to prepare a data set I used to perform 
the analyses establishing how means on the residential outcome of scaled contact 
with Whites (y) varies across subgroups and groups as reported in Tables 9.2 and 
9.3. The results in these tables were obtained by via tabulation routines that calcu- 
late means on the relevant dependent variables (y) across the categories of a cross 
classification table based on micro-level variables measuring the social characteris- 
tics of race, family type, and poverty status. In the analysis the records in the family- 
level data set were weighted by the variable “nfamilies” which has the number of 
families that have the specific combination of social characteristics and reside in the 
block group in question. The same family-level data set can be used to perform 
micro-level statistical analyses such as analysis of variance (ANOVA) and multiple 
regression analysis predicting the dependent variable of individual residential 
attainments using the independent variables of race and other social characteristics.,*4 
I report regression results obtained in this way later in the chapter. 


9.3 Substantive Findings 


I now discuss the analysis results in more detail. Table 9.2 shows that in all three 
White-Minority comparisons scaled (pairwise) contact with Whites varies across 
categories of poverty status and family type as well as by race. Group means on this 
residential outcome determine the value of the separation index (S). Two clear pat- 
terns warrant mention even on cursory inspection of the table. The first is that 
minority contact with Whites is consistently lower for poverty families compared 
with non-poverty families. The second is that, within non-poverty families, married 


3 Weighting cases by the cell counts from the summary file tabulation makes this an individual- 
level regression because the cell count registers the number of families that reside in the block 
group in question and have the exact combination of race, family type, and poverty status coded 
for the case. 

‘These can be termed “saturated” models because they include all possible effects of poverty status 
and family type (including all interactions). 
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couple families have higher levels of contact with Whites. Table 9.1 also documents 
that overall and within categories of family type minority families are consistently 
more likely to be in poverty than are White families but with Asians being substan- 
tially less disadvantaged than Blacks and Latinos. Table 9.1 also shows that the 
overall percentage of families that are married couples and non-poverty is much 
higher for Whites (81.9 %) and Asians (78.1 %) than for Blacks (46.9 %) and Latinos 
(62.0 %). The combination of these two patterns suggests it is plausible to hypoth- 
esize that group differences on poverty and family composition may play a role in 
making White-Black and White-Latino segregation more pronounced than White- 
Asian segregation. 

Closer inspection of the patterns in Table 9.2 lends additional credibility to this 
conjecture. In the White-Black comparison pairwise contact with Whites (p) varies 
within a narrow interval of 7.6 points for White families ranging from a low of 
83.7 % for female-headed families with children and in poverty to a high of 91.3 % 
for non-poverty married couples with children. For Black families contact with 
Whites is generally much lower than that observed for Whites in every category of 
family type. This suggests that race is a crucial factor in shaping the value of S (the 
group difference of means on p). However, it also is the case that Black contact with 
Whites varies by 21.6 points over categories of poverty status and family type for 
Blacks. The lowest level of 19.7 % is seen for married couple families without chil- 
dren and in poverty and this level also is seen for female-headed families without 
children and in poverty. The highest level of 41.3 % is seen for non-poverty married 
couples with children. The contrast is dramatic; the level of contact seen for the lat- 
ter group is 21.6 points higher and more than double the level see for the first two 
groups. This suggests that, in addition to the important role of race alone, group 
differences in family type and poverty status also might impact the value of S for the 
White-Black comparison. 

Similar patterns are evident in the results for the White-Latino comparison and 
the White-Asian comparison. In the White-Latino comparison Latino contact with 
Whites (p) is lower than that observed for Whites for every combination of family 
type and poverty status suggesting a clear “across the board” race effect. But it also 
is clear that contact with Whites varies across categories of family type and poverty 
status; by 18.7 points for Latinos and by 13.2 points for Whites. Combining this 
information with the knowledge that Latinos are disproportionately concentrated in 
categories of family type and poverty status that experience lower levels of contact 
with Whites suggests that group differences in distribution by poverty and family 
type may impact the level of White-Latino segregation. 

In the White-Asian comparison Asian contact with Whites (p) is lower than that 
observed for Whites across all categories of family type and poverty status again 
suggesting an across the board” race effect but Asian contact with Whites varies 
much more (by 13.5 points) across categories of family type and poverty status than 
is observed for Whites (only 1.9 points) thus lending plausibility to the hypothesis 
that group differences in distribution by poverty and family type may impact the 
level of White-Asian segregation. 
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In sum, the patterns documented in Tables 9.1 and 9.2 lend plausibility to the 
hypothesis that group differences in social characteristics might play a non-trivial 
role independent of race in contributing to overall segregation. Without going into 
the same level of detail, I note that similar conclusions can be drawn based on 
reviewing the data on residential outcomes that determine the value of the dissimi- 
larity index (D) presented in Table 9.3. The key finding is that the subgroup means 
that determine D vary across poverty and family type within race. This raises the 
possibility that group differences in distribution across these social categories may 
be a factor contributing to segregation as measured by D. 


9.4 Opportunities to Perform Standardization 
and Components Analysis 


The micro-level data set used to prepare Tables 9.1, 9.2, and 9.3 also can be used to 
apply the workhorse demographic techniques of standardization and components 
analysis (e.g., Kitagawa 1955; Winsborough and Dickinson 1971; Althauser and 
Wigler 1972; Iams and Thornton 1975; Jones and Kelley 1984) to gain insights into 
what factors give rise to segregation. The technique of standardization involves 
adopting a “standard” relative frequency distribution for poverty status and family 
type and using it, not the “observed” distributions given in Table 9.1, to weight the 
group-specific means on residential outcomes over poverty status and family type to 
calculate “expected” group means on residential outcomes. The resulting “‘standard- 
ized” group means can be interpreted as the group averages on segregation-relevant 
residential outcomes (y) that would result if both groups had the same “standard” 
distribution” on social characteristics while continuing to experience their 
“observed” residential outcomes documented in Table 9.2. The difference between 
the two group means in the standardized comparison can be interpreted as the level 
of segregation that remains when group differences in distribution by family type 
and poverty status have been “taken into account” by statistically setting them to be 
equal. 

Table 9.4 reports results of standardization analyses of the type just outlined. In 
conducting this analysis I adopted the observed distribution of all families (both 
White and minority group combined) over the categories of poverty status by family 
type as the relevant “standard” for the distribution of social characteristics. The top 
panel of the table reports results for the average levels on residential outcomes (y) 
that determine the value of the separation index (S) that would obtain for Whites 
and minorities if they had the same “standard” distribution for social characteristics. 
In the White-Black comparison the standardized mean for Whites is 89.46. This is 
about 0.40 points lower than the observed mean for Whites of 89.86. The standard- 
ized mean for Blacks is 35.07. This is about 2.59 points higher than the observed 
mean for Blacks of 32.48. The difference of the standardized group means can be 
interpreted as the value of the separation index (S) standardized to the condition of 
Whites and Blacks having identical distributions across family type and poverty 
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Table 9.4 Observed and standardized White-Minority segregation comparisons, Houston, Texas, 
2000 


White-Black White-Latino White-Asian 


Separation index (S) 


Observed group means on scaled contact with Whites (y) observed 


White mean (y) 89.86 81.12 93.79 
Minority mean (y) 32.48 40.17 69.91 
Difference 57.38 40.95 23.88 
Means standardized on overall distribution for family type and poverty status 
White mean (y) 89.46 80.35 93.78 
Minority mean (y) 35.07 42.06 69.59 
Difference 54.38 38.29 24.19 


Index of dissimilarity (D) 
Observed group means on scaled contact with Whites (y) 


White mean (y) 87.73 81.49 75.15 
Minority mean (y) 16.75 23.12 16.93 
Difference 70.98 58.37 58.22 
Means standardized on overall distribution for family type and poverty status 
White mean (y) 87.04 80.26 75.28 
Minority mean (y) 19.40 25.58 16.83 
Difference 67.64 54.68 58.45 


Source: US Census Summary File 3 


status. The initial observed value of S was 57.38 points. The standardized value of 
S is 54.38 points. Thus, “standardizing” the comparison to a common distribution 
on poverty status and family type reduces the value of S by 3.00 points. This result 
provides a statistically sound basis for concluding that White-Black differences in 
the social characteristics considered here play only a small role in determining the 
overall level of White-Black segregation; simply put, “controlling” for group differ- 
ences on social characteristics using sound methods of statistical analysis produces 
only a modest reduction in segregation. 

This result also can be interpreted as indicating that the level of segregation as 
assessed by the observed value of S traces primarily to the effect of race. That is, 
group separation as measured by S traces to group differences in contact with 
Whites that arise independent of poverty status and family type. A more thorough 
decomposition analysis (per Kitagawa 1955; Althauser and Wigler 1972; Iams and 
Thornton 1975; Jones and Kelly 1984) could quantify this in a more careful way. Of 
course, like all standardization and decomposition exercises, thoughtful interpreta- 
tions must consider the theoretical relevance of the “control” variables and the ade- 
quacy of the micro-level analysis that seeks to capture the relationship between 
non-racial social characteristics and segregation-relevant residential attainments. 

Table 9.4 also reports results of standardization analyses for the separation index 
(S) for the White-Latino and White-Asian comparisons. These analyses also indicate 
that differences in group distribution over family type and poverty status do not play 
a major role in determining the overall level of segregation between the groups. In 
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the case of the White-Latino comparison, standardizing on poverty status and fam- 
ily type reduces S by 2.66 points lowering it from 40.95 to 38.29. In the case of the 
White-Asian comparison, standardizing on poverty status and family type increases 
S by 0.31 points raising it from 23.88 to 24.19. This suggests that group differences 
in family type and poverty status serve to obscure the impact of race on overall 
White-Asian segregation. 

The lower panel of Table 9.4 reports results of a set of parallel analyses focusing 
on segregation measured using the index of dissimilarity (D). To perform this paral- 
lel analysis, I made only one change; I used a new dependent variable; namely, y as 
scored for D (reported in Table 9.3) instead of y as scored for S (reported in 
Table 9.2). Recall that in this case y is now scored 1 if p 2 P and 0 otherwise. The 
impact of standardizing the White-Minority comparison to a common distribution 
on poverty status and family type here is very similar to that seen for the analysis for 
S. In the case of the White-Black comparison, standardizing on poverty status and 
family type reduces D by 3.44 points from 70.98 to 67.64. For the White-Latino 
comparison, the standardization exercise reduces D by 3.69 points from 58.37 to 
54.68. For the White-Asian comparison, standardizing on poverty status and family 
type increases D by 0.23 points from 58.22 to 58.45. Thus, as seen in the analysis 
for S, the level of segregation measured using D changes little when one uses appro- 
priate statistical methods to take account of the possible impact of group differences 
in family type and poverty status. 

ĮI also performed similar standardization exercises for other segregation indices — 
specifically, G, R, and H. However, I do not report the details here as the basic find- 
ing is the same in all cases. That is, when analyzing group differences in residential 
outcomes that determine segregation as measured by G, R, and H, standardizing the 
White-Minority comparisons to a common group distribution on poverty status and 
family type reduces segregation by only modest amounts. 


9.5 Comparison with Previous Approaches to “Taking 
Account” of Non-racial Social Characteristics 


The ability to conduct the standardization exercises just reviewed is a completely 
new option made possible by the difference of means framework for measuring 
segregation. Several considerations make this approach superior to current practices 
for assessing or controlling for the role of non-racial social and economic 
characteristics of individuals on segregation. First, the approach can be easily 
extended to directly “control for” the role of many social characteristics in a single 
analysis where previously this has not been feasible. Second, the approach can draw 
on a broader range of information and a larger number of cases than is typical in 
current approaches to taking account of non-racial social characteristics and as a 


>This is not surprising as the discussions in Chaps. 5, 6, and 7 note that when D and S give similar 
results all popular indices of uneven distribution will give similar results. 
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result yields results that are more appropriate and statistically reliable. Third, the 
results of the approach are much less susceptible to problems of distortion resulting 
from index bias and ecological fallacies than are results of current practices. I now 
briefly comment on each of these points. 

The prevailing approach for taking account of the impact that factors other than 
group membership may have on segregation involves calculating segregation scores 
for subsets of individuals from the two groups that are matched on social character- 
istics. In the present context, that would involve calculating as many as 10 different 
White-Black segregation scores, one each based on the just the families found in the 
10 categories of family type and poverty status. Or, for simplicity, the analysis might 
be limited to calculating the index score for one carefully chosen subgroup compari- 
son such as non-poverty, married couple families with children, the family type with 
the largest number of families across all four racial groups. When the obtained 
index scores is lower than the score for the overall segregation comparison, the 
result is interpreted as indicating that segregation is lower when social characteris- 
tics are “controlled” and thus supports the conclusion that the impact of group dif- 
ferences on social characteristics on segregation is important. Alternatively, when 
the scores obtained is not lower than the score for overall segregation, the result is 
interpreted as indicating that the impact of group differences on social characteris- 
tics on segregation is modest or unimportant. 

Unfortunately, basing the analysis on segregation scores calculated for matched 
comparisons involving small subgroup numbers often introduces non-trivial com- 
plications and concerns. One problem is that the approach subtly changes the sub- 
stantive and quantitative relevance of the analysis. Note that the standardized 
segregation index scores reported in Table 9.4 are based on the full group distribu- 
tions over many combinations of social characteristics and thus register the full 
spectrum of patterns of segregation for racial comparisons between and across all 
combinations of the 10 categories of family type and poverty status. 

Anchoring the scores on the full range of data for both groups carries statistical 
and substantive benefits. Using the full group makes the comparison more statisti- 
cally reliable; thus, for example, the standardized group means that determine the 
standardized values of S and D have smaller standard errors than group means com- 
puted for narrow subgroups. Substantively, using the full group data is attractive 
because it assesses segregation patterns between and across all combinations of 
social characteristics not just for a narrowly specified comparison that could poten- 
tially be idiosyncratic. Arguably this protects against getting unusual results for a 
particular narrowly defined comparison. Importantly, the approach also does not 
exclude the cross-category comparisons which quantitatively make large contribu- 
tions to determining overall segregation but are completely ignored when compari- 
sons are restricted to only one-to-one matches on social characteristics. 

Another more technical problem is that scores based on narrowly defined sub- 
groups are prone to being distorted by index bias. The problem of index bias is 
well-known and potentially vexing. Accordingly I give it extended attention in 
Chaps. 14, 15, and 16. Concern about index bias is especially relevant when group 
counts in spatial units are small and group ratios are imbalanced (Winship 1977). 
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This problem is likely to be salient when subgroup comparisons are based on small 
subsets of cases that exactly match on non-racial social characteristics. For example 
if one matches White and Black families on poverty status and family type, the 
counts families in each area will drop substantially. Furthermore, the underlying 
problem is likely to be even worse than it appears on first consideration. The reason 
for this is that the census tabulations that include other social characteristics in addi- 
tion to race are based on samples instead of full counts. The summary file tabula- 
tions report “estimated” full counts. In fact, the analysis rests on a much smaller 
number of underlying cases. In the present example using data for 2000, the data are 
based on an approximate 1-in-6 (16.7 %) sample. Using more recent five-year sum- 
mary files from the American Community Survey, the data would be based on a 
1-in-20 (5 %) sample. Analysis of segregation between “matched” subsets of cases 
thus is likely to rest on a small set of cases in each block group. 

Another problem is that, even under the best of conditions, it is usually infeasible 
to extend this conventional approach to take account of more than one or two non- 
racial characteristics at a time. Restricting the comparison to White and minority 
families matched on several characteristics at once will almost always result in bas- 
ing the analysis on an unacceptably small number of micro-level cases. In contrast, 
the standardization approach applied in this chapter draws on the full population in 
each group and can in principle include many more social characteristics. The 
“ANOVA-style” reliance on categories instead of continuous predictors in the 
examples considered here can run into problems when means for some subgroups 
are less reliable due to being based on a small number of cases. However, the prob- 
lem is less troublesome than the usual approach used in the literature. Moreover, it 
can be mitigated by using continuous measures in place of categories and adopting 
refined regression modeling strategies such as using multi-level specifications (dis- 
cussed in Chap. 10) to improve estimation of effects. Thus, the difference of means 
framework provides clear advantages when researchers wish to take account of sev- 
eral non-racial characteristics at once. 


9.6 Aggregate-Level Controls for Micro-level Determinants 
of Residential Outcomes 


Segregation studies sometimes “take account” of group differences on social char- 
acteristics that play a role in residential outcomes in a fundamentally different way; 
namely, by estimating aggregate-level regressions where measures of group dispar- 
ity on a relevant social characteristic (e.g., income or poverty status) is used to pre- 
dict cross-city variation in segregation index scores. This strategy raises concerns 
about the risk of flawed inference associated with the “ecological” or “aggregate” 
fallacy. 

It is fair to say that this concern does not seem to be widely recognized because 
the practice is routine in empirical studies and apparently not subject to strong criti- 
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cism.° Two factors may help explain why the prevailing practice is seen as non- 
controversial instead of seriously flawed. One is that traditional formulations of 
segregation indices encourage the view that the index score is an aggregate-level 
characteristic of cities that is not directly a product of individual-level attainment 
processes in way that would raise strong concerns about the undesirable conse- 
quences of the aggregate fallacy. The second is that, while studies in the location 
attainment tradition could potentially promote the view that segregation should be 
understood as arising out of micro-level residential attainment processes, they ulti- 
mately do not do so because until now micro-models could not be used to directly 
investigate segregation as measured by the dissimilarity index (D) and other popular 
aggregate-level indices. 

The findings in this chapter show that analysis of segregation using popular 
aggregate-level measures can be joined seamlessly with analyses of micro-level 
residential attainment processes. The difference of means formulation of standard 
segregation indices makes this possible by establishing that segregation can be 
understood as a difference of group means on individual-level residential outcomes 
that in a given city are determined by a micro-level attainment process where many 
individual-level characteristics can impact segregation. The data and analyses pre- 
sented in Tables 9.2, 9.3, and 9.4 clarify how the individual-level characteristics of 
race, poverty status, and family type affect residential outcomes (y) that then aggre- 
gate in a simple additive way to determine the level segregation in the city. This 
example establishes that the parallel with analyses of group differences other socio- 
economic attainment outcomes (e.g., education, occupation, income, home owner- 
ship, etc.) is exact. This then highlights a lack of correspondence on another point; 
namely, the failure of segregation researchers to show appropriate concern for the 
aggregate fallacy in aggregate-level segregation studies. 

Researchers analyzing group differences in income understand that the aggregate- 
level outcome of inter-group income inequality in a particular city emerges as a 
product of an underlying micro-level process of income attainment for that city. As 
a result, it is easier for these researchers to recognize that the ideal way to obtain a 
sound assessment of the role that non-racial social characteristics play in producing 
group income inequality in a city is to draw on detailed micro data for that city. It 
also is easier for these researchers to recognize that attempts to take account of the 
role of non-racial social characteristics in producing inter-group income inequality 
using only aggregate data carries a high risk of mistaken inference due to the aggre- 
gate fallacy. I reviewed these issues more than two decades ago in an article that 
outlined the nature of the problem in detail and provided an empirical demonstra- 
tion of how aggregate-level analysis leads to errors of inference and mistaken con- 
clusions about the role of group differences in social characteristics for cross-area 
variation in group income inequality (Fossett 1988). Researchers interested in this 
topic appear to have adapted and moved forward. In recent decades there has been 
a fundamental change in the research literature. Aggregate-level analyses of cross- 


° For example, in my experience journal reviewers not only do not object to this practice, they often 
request that it be incorporated into the analysis. 
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city variation in group income inequality were common in earlier decades and they 
routinely included aggregate-level measures to “control” for the impact of group 
differences on individual-level characteristics that predict income (e.g., income).’ 
Such studies are no longer accepted as most researchers now understand that one 
must use disaggregated data to properly investigate these issues. 

A similar reckoning is looming for the literature investigating cross-city varia- 
tion in residential segregation. Concern about the aggregate fallacy currently is 
minimal because segregation researchers are not in the habit of viewing city-level 
segregation scores as mapping directly onto micro-level residential outcomes. 
Accordingly, segregation researchers do not automatically think in terms of using 
micro data to take account of the role of non-racial social characteristics in shaping 
residential segregation. This creates a “blind spot” for the possibility that key find- 
ings from studies that investigate cross-city variation in segregation may be suspect 
because the studies use research designs that incorporate the aggregate fallacy. 

The data and analyses presented in Tables 9.2, 9.3, and 9.4 provide examples of 
how the differences of means approach makes it possible to “take account of” the 
impact of group differences on social characteristics on segregation in a way that is 
superior and offers a better chance to make correct inferences in comparison to past 
approaches. The data in these tables cast segregation as a group difference of means 
on residential outcomes (y) that emerge from a micro-level attainment process 
where race, poverty status, and family type all play a role in influencing residential 
outcomes. Once segregation is conceptualized in this way, it is clear that the proper 
statistical approach for taking account of group differences on poverty and family 
type is to perform city-specific standardization analyses using relevant attainment 
data disaggregated at the micro level for the city in question. 

The limitations of the prevailing practice are revealed by the standardization 
analyses reported in Table 9.4. The results from the analyses directly answer the 
question of whether racial segregation arises due to group differences in poverty 
status and family type for the city in question. In each group comparison, the answer 
obtained is conceptually and statistically sound. The answer developed from analy- 
ses reported in Tables 9.1, 9.2, 9.3, and 9.4 also is definitive and complete. Group 
differences on social characteristics do not play a significant role in accounting for 
the observed level of White-Black segregation. This conclusion is anchored in a 
direct examination of the micro-level relationships between White and Black resi- 
dential attainments in Houston in 2000. It cannot be improved by examining 
aggregate-level data for other White-Minority comparisons in the same city or even 
hundreds of such comparisons across other cities. 

Moreover, analysis using only aggregate-level measures can easily lead to mis- 
taken conclusions. For example, the analyses show that White-Asian segregation is 
lower than White-Black segregation and they also show that White-Asian differ- 
ences in poverty are smaller than White-Black differences in poverty. The logic of 


7Examples include Becker (1971 [1957]), Bahr and Gibbs (1966), Jiobu and Marshall (1971), 
Roof (1972), LaGory and Magnani (1979), and Elgie (1980) among many others as the practice 
was routine. 
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aggregate-level analysis would infer from this pattern that segregation is more pro- 
nounced when group differences in poverty are large. But analysis of the relation- 
ship using relevant micro-level data establishes that the impact of group differences 
in poverty is minimal. 

Similarly, the result for the answer to the question cannot be improved by exam- 
ining aggregate-level data for White-Black segregation in other cities. For example, 
if one examined a large sample of metropolitan areas and found a strong positive 
aggregate correlation between White-Black segregation and White-Black differ- 
ences in poverty or income, the conclusion about the impact of White-Black differ- 
ences in poverty on White-Black segregation in Houston based on the standardization 
analysis in Table 9.4 is not challenged and will stand unchanged. The aggregate- 
level findings are “trumped” by the direct analysis of relevant micro-level data for 
White-Black segregation in Houston. 

In reviewing the general issues in detail in an earlier study (Fossett 1988) I noted 
that, while it is certainly plausible that group differences on social and economic 
characteristics could give rise to group differences on relevant attainment outcome, 
aggregate-level correlation is not a sound way to assess this possible effect. The 
sound way to assess the impact in a given city and group comparison is by working 
with relevant disaggregated data to examine the relationship at the micro level. 
Resorting to aggregate-level controls is tempting, but there are compelling reasons 
to discontinue this practice. One such reason can be summarized as follows. 


Urban-ecological theories of cross-area variation in racial stratification provide a strong 
basis for expecting group differences on inputs to attainment processes to be spuriously 
correlated with group differences on outcomes of attainment processes at the aggregate 
level (Fossett 1988). 


The fundamental premise of urban-ecological theories of racial stratification is 
that some community-level factors shape group relations “across the board.” If so, a 
general climate of minority disadvantage, tracing for example to comprehensive Jim 
Crow laws or high levels of White prejudice and discrimination in socioeconomic 
attainment processes, can lead to both high levels of White-Black differences in 
poverty and White-Black segregation. However, the resulting correlation of segre- 
gation with group differences in poverty and income produced by this social 
dynamic can easily be spurious, not causal. Thus, for example, if discrimination in 
housing severely constrains the residential opportunities of non-poverty Black 
households reducing their segregation-relevant contact with Whites, eliminating 
group differences in poverty will have no impact segregation. 

It is not possible to sort out whether the aggregate relationship is spurious or 
causal with aggregate-level data. One must ultimately examine relevant micro data 
to directly assess whether reducing group differences in poverty or income would in 
fact reduce segregation in a given city. In Chap. 10 I present empirical analyses that 
illustrate how to perform such analyses. These analyses document and affirm that 
the empirical findings and central conclusions I reported in Fossett (1988) also 
apply to analyses of residential segregation. Specifically, the analyses document two 
parallels with the earlier study. The first is that aggregate-correlations and regres- 
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sions suggest that group differences in income play a major role in accounting for 
cross-city variation in segregation. The second is that this conclusion is shown to be 
incorrect when one uses micro-level data to properly take account of the impact of 
income differences on segregation. Based on this I caution segregation researchers 
to take seriously the concern that the practice of using aggregate-level regressions 
to assess the role of factors that operate at the micro-level is unsound and can yield 
misleading results. 


9.7 New Interpretations of Index Scores Based on Bivariate 
Regression Analysis 


Investigation of segregation using the technique of standardization analysis joins 
aggregate-level analysis with residential attainment analysis by clarifying how seg- 
regation index scores for a city arise from micro-level residential attainment pro- 
cesses shaped by racial and non-racial social characteristics. This point can be 
highlighted by noting that the data presented in Tables 9.2 and 9.3 correspond to 
predictions of mean residential attainments derived from individual-level models of 
residential attainment. More precisely, the subgroup means on residential outcomes 
correspond to predictions from individual-level analysis of variance (ANOVA) 
models or, alternatively, individual-level regression models. The tables reports 
means for residential attainments (y=scaled pairwise contact with Whites as rele- 
vant for S or D) for individual families grouped by category of race, poverty status, 
and family type. This corresponds to an individual-level ANOVA or regression 
analysis predicting residential attainments based on three categorical independent 
variables: family type (five categories), poverty-status (two categories), and race 
(two categories). Thus, the subgroup means reported in Tables 9.2 and 9.3 corre- 
spond to predictions from a “fully saturated” model which estimates all possible 
additive and non-additive effects for race, poverty status, and family type on resi- 
dential attainments. The standardization analyses reported in Table 9.4 implicitly 
rest on these attainment models. It is a natural next step to explicitly focus on the 
results of the attainment model to assess more specifically how the effects of the 
independent variables shape residential segregation. 

The difference of difference of means framework yields a set of new and poten- 
tially attractive interpretations for segregation index scores. It is that the values of 
scores for indices such as S and D now can be described as reflecting the effect or 
impact of race on residential outcomes that determine segregation. This interpreta- 
tion is straightforward in a bivariate model of individual-level residential attainment 
where race is the only predictor. When introducing the difference of means formula- 
tions, I offered computing formulas for obtaining index scores as a difference of 
group means. I now note that the index scores also can be obtained via an 
individual-level bivariate regression analysis in which a dummy variable for race 
(i.e., group membership) is used to predict the residential outcomes (y) that are 
relevant for a particular index. 
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Table 9.5 Bivariate segregation attainment regressions predicting residential outcomes (y) that 
additively determine White-Minority segregation for selected indices, Houston, Texas, 2000 


Independent variable G* G/2* D/2* D R H S 
White-Black comparison (N=811,924) 
White (0,1 = White) 87.07 43.54 35.48 70.97 47.02 53.59 57.39 
Constant 33.64 16.82 22.96 16.75 25.96 28.98 32.48 
R Square 0.412 0.412 0.442 0.442 0.495 0.557 0.574 
Implied index score 87.07 87.08 70.96 70.97 47.02 53.59 57.39 
White-Latino comparison (N=911,060) 
White (0,1 = White) 74.19 37.09 29.19 58.37 28.11 35.46 40.96 
Constant 49.53 24.76 30.14 23.12 35.26 37.50 40.17 
R Square 0.359 0.359 0.317 0.317 0.385 0.404 0.410 
Implied index score 74.19 74.18 58.38 58.37 28.11 35.46 40.96 
White-Asian comparison (N=672, 968) 
White (0,1 = White) 76.28 38.14 29.11 58.22 34.96 31.31 23.88 
Constant 29.94 14.97 23.27 16.93 35.74 52.16 69.91 
R Square 0.131 0.131 0.122 0.122 0.135 0.198 0.239 
Implied index score 76.28 76.28 58.22 58.22 34.96 31.31 23.88 


Source: US Census 2000, Summary File 3 
Notes: G* is given by (Y|-Y2) when y is scored 0-200 and G/2* and D/2* are given by 2:(Y\-Y>) 
when y is scored 0-100. All regression coefficients are statistically significant at 0.001 or better 


In the case of White-Black segregation as measured by the separation index (S), 
the regression would include a dummy variable for “White” coded 1 if White and 0 
if Black to predict the residential outcome of pairwise contact with Whites ( y = p 
). The value of the estimated regression intercept (bọ) will indicate the average con- 
tact with Whites for Blacks (i.e., the baseline group coded 0 on race). The value of 
the unstandardized regression coefficient for White (b,) will indicate the extent to 
which the White mean for contact with Whites (i.e., the group coded 1 on race) 
deviates from the Black mean for contact with Whites. Accordingly, the value of the 
regression coefficient also will exactly equal the value of the segregation index 
score; that is, b, = S . (And, for the sake of completeness, mean contact with Whites 
for Whites will be given by b, +b, .) At one level, this is not surprising as most read- 
ers will already be aware that bivariate dummy variable regression is mathemati- 
cally equivalent to a difference of means comparison. But it is a new development 
in segregation measurement theory to interpret a segregation index score as the 
effect of race in a micro-level process of residential attainment. Thinking in this way 
opens up new avenues for exploring and interpreting segregation. 

Table 9.5 reports results for a series of bivariate regressions of the type just 
described estimated using the micro-level data set for Houston, Texas introduced 
earlier in this chapter. Recall that this data set reconstitutes the block group-level 
summary tabulations so the information in the tabulation is organized in a data set 
appropriate for performing individual-level attainment analysis. Cells in the tabula- 
tion are treated as cases and are coded on independent variables — race, poverty 
status, and family type — to suit the needs of the analysis. Dependent variables relat- 
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ing to index-specific scores based on block-group race counts are assigned to block 
groups and then merged with the individual level data based on block group codes. 
The resulting data set can be used to estimate regression analyses in the conven- 
tional way with the proviso that a variable representing cell counts be used as a 
case-level frequency weight.® 

The independent variable used in the bivariate regressions reported in Table 9.5 
is a “dummy” (0, 1) variable for “White” coded 1 for White and 0 for minority 
depending on the race of the family’s householder. The dependent variables are resi- 
dential outcome scores (y) scaled from pairwise area proportion White (p) as appro- 
priate for the segregation index of interest and the relevant group comparison. 
Table 9.5 reports results for separate regression analyses for five segregation indi- 
ces — namely, G, D, R, H, and S.? 

An important finding is evident in the results. In each regression analysis, the 
unstandardized regression coefficient for the dummy variable for race (here coded 1 
if White and 0 otherwise) yields the value of the relevant index score (previously 
reported in Table 5.2). In the case of G, individual residential outcomes (y) are 
scored two ways; one to yield G and one to yield G/2. In the latter coding, the value 
of the coefficient for race must be doubled to obtain the value of G. The table also 
reports results for D taken as a crude version of G and thus scores residential out- 
comes (y) in relation to D/2. In this regression the coefficient for race must be 
doubled to obtain the value of D. The table also reports results based on the alterna- 
tive formulation of D where residential outcomes (y) are coded as either 0 or 1. 

Of course, relatively little is gained if we stop with the simple bivariate regres- 
sion analysis. It merely recasts the difference of means comparison reviewed earlier 
in Table 5.2 in the regression (or ANOVA) framework. The most important descrip- 
tive findings to be gleaned from the analysis — namely, the group means and the 
group difference of means — are exactly the same as those previously reported ear- 
lier in Tables 5.2 and 5.3. So no new information is gained. 

Regression analysis does potentially provide a useful framework for hypothesis 
testing regarding the level of segregation. But this has minimal practical value at the 
bivariate level of analysis as statistical significance is typically not a central concern 
in segregation analysis. Sample sizes and race effects both tend to be large in analy- 
ses of the overall level of segregation and thus statistical tests tend to be significant 
at levels far beyond conventional standards (i.e., 0.05 and 0.01). For example, the 


8 When estimation routines in statistical programs have this capability, the data set can be stored an 
efficient, compact form. Alternatively, one may create a separate record for each family included 
in the summary file tabulation. The resulting data set will be much larger. Some might find the less 
compact form of the data set more familiar for conducting individual level regression analysis. But 
regression results will be identical either way. 

The regressions were estimated using OLS regression. This is satisfactory for present purposes 
and is a convenient choice because it simplifies the presentation and discussion of results. In other 
situations, it may be necessary to use more technically appropriate regression procedures such as 
fractional logit regression (Papke and Wooldridge 1996; Wooldridge 2002) to deal with the prob- 
lem that OLS assumptions are not valid when modeling bounded variables and OLS can yield 
predictions outside of the 0-1 bounds of segregation indices. 
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t-ratio for the effect coefficient of race in the bivariate regression of pairwise contact 
with Whites on a dummy variable for race for the White-Black comparison is over 
1000 and the probability of chance deviation from 0.0 is zero out to many decimal 
places. In such circumstances, the usual concerns about statistical significance and 
technical regression assumptions fade into the background. 

The more significant potential benefit of regression analysis is that it provides an 
opportunity to put segregation research on a new path for gaining a better, more 
direct understanding how segregation arises. Specifically, analyzing segregation 
from the difference of means framework sets segregation researchers on the path of 
investigating segregation using the methods and modeling strategies that status 
attainment researchers routinely use to investigate racial disparities and inequality 
on education, occupation, income, health, and other socioeconomic and life chance 
outcomes. These methods and modeling strategies previously have not been avail- 
able to segregation researchers because the link between micro-level attainment and 
aggregate-level segregation (city segregation scores) was not established. The dif- 
ference of means formulation of segregation indices thus allows researchers to 
move away from focusing simply on the calculation of descriptive index scores that 
summarize the state of segregation at the aggregate level. It instead allows research- 
ers to move toward investigating segregation through the more analytically flexible 
method of performing multivariate analyses to assess the zero-order and net effects 
of race (group membership) on individual-level residential outcomes that directly 
determine the level of segregation. 

I discuss the extension to multivariate analysis of segregation-relevant residential 
outcomes in more detail below. But first it is useful to point out that different indices 
register residential outcomes in different ways — based on index-specific functions 
y=f (p) — and that these differences carry implications for interpreting the effect of 
race on residential outcomes in individual-level attainment analyses. Here it is use- 
ful to recall Fig. 5.1 which clarifies how these five segregation indices differ in 
registering residential outcomes (y) scored from area group proportion (p). In the 
case of G, D and R, y is scored as a nonlinear transformation of p that in these group 
comparisons tends to exaggerate group differences at high levels in p and minimize 
group differences over the middle ranges of p. H also involves a similar nonlinear 
transformation, but it is much less dramatic. In contrast, S scores y simply on the 
basis of p and does not subject p to a transformation. This makes the regression 
results for the separation index (S) especially easy to interpret and a good place to 
begin. 

Table 9.5 reports the results for the bivariate regression y = b, +b, (race) rele- 
vant for investigating White-Black segregation as measured by S as y = 32.5+57.4 
(race) where race is coded 1 for White and 0 for Black. In this example, the value of 
the regression constant (bo) is 32.5 and reflects Blacks’ average contact with Whites 
(Ys). The value of the unstandardized regression coefficient for race (b,) is 57.4. It 
reflects the impact that race has on average contact with Whites; namely, to raise 
contact with Whites by 57.4 points above the level of contact that Blacks experi- 
ence. The sum of by and b; gives the predicted value of 89.9 for Whites’ average 
contact with Whites (Yw). These values map exactly onto the terms reported in 
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Table 5.3 which showed how index scores can be obtained as differences of group 
means on residential outcomes. Thus, the value of S for White-Black segregation 
overall is 57.4 resulting because White families live in neighborhoods where pair- 
wise percent White averages 89.9 while Black families live in neighborhoods where 
pairwise percent White averages 32.5. 

This highlights the new interpretation available for S as indicating that race — 
specifically, being White instead of Black — “matters” for residential outcomes and 
in this case has the impact of increasing contact with Whites by 57.4 points in com- 
parison with the reference group of Blacks. The magnitude of the effect makes it 
clear that race differences in residential attainment produce substantial residential 
separation between Whites and Blacks as Whites are predicted to reside in predomi- 
nantly White areas and Blacks are predicted to live in predominantly Black areas. 

It is instructive to compare the effect of race in the White-Black comparison with 
the effects of race in the bivariate segregation attainment analyses for the White- 
Latino and White-Asian comparisons. The race effect of 41.0 points in the White- 
Latino regression is approximately 16 points lower than that in the White-Black 
regression. Thus, we can conclude that race “matters less” in promoting residential 
separation of Whites from Latinos than it does in promoting residential separation 
of Whites from Blacks. However, the effect is still large and has the consequence of 
on average placing Whites in predominantly White areas while Latinos are in pre- 
dominantly Latino areas. The race effect of 23.9 points in the White-Asian regres- 
sion is approximately 34 points lower than in the White-Black regression. Based on 
this we can conclude that, while the effect of race is not trivial, race matters much 
less in promoting residential separation of Whites from Asians than it does in pro- 
moting residential separation of Whites from Blacks. One clear indication of this is 
that the effect of race on average leaves both Whites and Asians being predicted to 
reside in predominantly White areas. 

The bivariate results for D suggest a somewhat different story. I focus on the 
results for D based on scoring residential outcomes as 0 or | based on whether the 
family attains parity on contact with Whites based on whether area proportion 
White equals or exceeds the level for the city as a whole (i.e., 1 if p 2 P, 0 other- 
wise). For this residential outcome, race matters a great deal in all three group com- 
parisons. The unstandardized regression coefficients for race take high values in 
each analysis reaching approximately 71.0 in the White-Black analysis, 58.4 in the 
White-Latino analysis, and 58.2 in the White-Asian analysis. In substantive terms, 
we can interpret these effects as indicating that, in each comparison, race — that is 
being White in contrast to being minority — has a large impact on the probability of 
residing in an area where proportion White attains parity with city-wide proportion 
White. 

This information is not without value. But it also is important to be aware of what 
is not revealed when modeling micro-level outcomes that determine the value of 
D. Namely, this analysis fails to provide a basis for assessing the quantitative 
differences in the racial composition of the neighborhoods the groups live in. If one 
does not bear this in mind, one can come away with an incomplete and potentially 
misleading impression of the nature of segregation in these three comparisons. This 
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is particularly true in the case of the White-Latino and White-Asian analyses. The 
comparison on the effect of race in these analyses shows that it is essentially the 
same in both two regression equations. This indicates that the White advantage in 
the probability of attaining parity on area proportion White is the same in relation to 
Latinos and Asians. In addition, comparison of the regression constants indicates 
that, overall, Asians are less likely than Latinos to attain parity on area proportion 
White. The combination suggests that White-Latino segregation and White-Asian 
segregation are very similar. 

But it is important to bear in mind that D is sensitive to group differences in 
attaining “parity” on neighborhood proportion White where “parity” is assessed in 
relation to the citywide pairwise racial proportions. As a result, the effect of race in 
the models for D does not support inferences and interpretations relating to group 
differences in the actual level of pairwise contact with Whites or to group differ- 
ences in “fixed” outcomes such as probabilities of residing in neighborhoods that 
are majority (50%) White, two-thirds (67 %) White, or predominantly (e.g., 80%) 
White. For example, in the case of Houston, Texas, Latinos are a much larger group 
than Asians. Accordingly, the “cut point” for scoring of residential outcomes as 
attaining “parity” on area proportion White for the White-Latino comparison is 
much different — specifically, much lower — than the “cut point” for scoring of resi- 
dential outcomes as attaining “parity” on area proportion White for the White-Asian 
comparison. Consequently, a naive interpretation of the race effect in the attainment 
analysis for D might suggest the conclusion that Latinos and Asians fare similarly 
in comparison to Whites but with Asians being less likely than Latinos to live in 
areas that are disproportionately White. But the analysis for S shows that Asians 
live in areas that on average are 69.9 % White, a full 29.7 points higher than Latinos 
who on average live in majority Latino areas. This suggests that the substantive 
value of scoring residential “disadvantage” based on “parity” is open to reconsid- 
eration. In particular, I pose the question, “What are the substantive and sociological 
implications of Asians experiencing near-identical disadvantage as Latinos on 
attaining “parity” when the two groups differ greatly in terms of their residential 
separation from Whites?” 


9.8 Multivariate Segregation Attainment Analysis (SAA) 


The bivariate regression examples just discussed are interesting and useful in their 
own right. They illustrate some of the benefits of directly modeling the individual- 
level residential outcomes that give rise to segregation index scores. Specifically, 
the approach enables and encourages more thoughtful and careful interpretation of 
race effects on residential outcomes across group comparisons and different indices. 
In the long run, however, the bivariate regressions are just a useful preliminary step 
toward investigating how the impact of race on segregation compares with the 
impacts of other social characteristics. This can be done by investigating micro- 
level analyses segregation-relevant residential outcomes using multivariate 
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attainment models in the manner that is already universal in other literatures inves- 
tigating racial disparities. 

I term this new approach “segregation attainment analysis” (SAA). The justifica- 
tion for the label is that the effect of race in bivariate models corresponds directly to 
the aggregate level of segregation in the city and its effect in multivariate models 
yields insights into how the impact of race should be assessed and interpreted when 
taking account of the role of non-racial factors that also impact residential 
outcomes. 

This can be accomplished by extending the micro-level attainment regressions to 
include additional independent variables beyond race. In this case, I used the tabula- 
tion of race by family type by poverty status to fashion the following independent 
variables: poverty status (0,1), married with spouse present (0,1), and presence of 
children under age 18 (0,1). Table 9.1 previously presented descriptive statistics 
based on this data set. It documents that the four groups in the analysis vary greatly 
on these variables. Non-poverty status runs from a high of 95.8% for Whites to a 
low of 80.2 % for Latinos. Percent of families that are married couple families runs 
from a high of 84.9 % for Asians to a low of 51.2 % for Blacks. And percent of fami- 
lies with children under age 18 runs from a high of 69.2 % for Latinos to a low of 
47.3 % for Whites. Given these group differences in distribution across social char- 
acteristics an obvious questions arise: “What role to these characteristics play in 
shaping residential outcomes that determining segregation?”, “How does their role 
compare with the role of race?”, and “How does the estimated effect of race change 
when other characteristics are controlled?” 

Tables 9.6, 9.7, and 9.8 report results of bivariate and multivariate regression 
analyses that can be used to address these and related questions. Each table has five 
panels. Each of the five panels presents results from regression analyses predicting 
dependent variables that additively determine segregation indices. The regression 
analyses are estimated and reported separately by racial group for ease of discussion 
and presentation. For hypothesis testing and for cross-time and cross-city compari- 
sons it may be more appropriate to estimate single-equation specifications which 
incorporate additive and non-additive race effects. White-Black comparisons are 
reported in Table 9.6, White-Latino comparisons in Table 9.7, and White-Asian 
comparisons in Table 9.8. 

Analyses of this sort can be used to gain a richer understanding of the residential 
attainment process that gives rise to segregation by permitting direct examination 
and comparison of the separate effects of racial and non-racial social characteristics 
on residential outcomes. Table 9.6 presents results relevant for the analysis of 
White-Black segregation. Results are presented separately for five indices. I begin 
by discussing the results for the separation index (S) reported in the fifth panel in the 
table. The first and second columns report separate regressions for Whites and 
Blacks with no other independent variables included in the model. The constants in 
these equations of course equal the group means for scaled contact with Whites (y). 
In the case of the separation index (S) y is given by the pairwise proportion White 
(p) in the block group and difference between the two group means yields the value 
of the separation index. This is reported as “White Advantage (S)” which has the 
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Table 9.6 Group-specific attainment regressions for White-Black segregation 
Simple comparison Comparison with controls 
Variable Whites Blacks Whites Blacks Net impact 
Residential outcome (y) scored for gini index (G) 
Non-poverty Family - E 5.66 8.37 =2.71 
Married couple family = = 8.62 7.83 0.79 
Children present = = 0.79 4.58 -3.79 
Constant 120.71 33.64 107.67 20.03 87.64 
White advantage (G) 87.07 87.64 81.93 
Residential outcome (y) scored for index of dissimilarity (D) 
Non-poverty family - - 6.72 6.45 0.27 
Married couple family - - 5.76 8.95 —3.19 
Children present = = 2.89 3.07 —0.18 
Constant 87.73 16.75 75.09 5.05 70.04 
White advantage (D) 70.98 70.04 66.94 
Residential outcome (y) scored for hutchens index (R) 
Non-poverty family - - 3.72 6.70 —3.98 
Married couple family = = 3.67 5.29 -1.62 
Children present - - 0.33 3.98 -3.65 
Constant 72.98 25.96 67.14 15.38 60.42 
White advantage (R) 47.02 60.42 42.52 
Residential outcome (y) scored for theil index (H) 
Non-poverty family - - 3:37 8.19 —4.82 
Married couple family = = 3.75 6.86 -3.11 
Children present - - 0.93 4.83 -3.90 
Constant 82.57 28.98 TID 15.85 59.90 
White advantage (H) 53.59 59.90 48.07 
Residential outcome (y) scored for separation index (S) 
Non-poverty family - - 3.58 9.59 -6.01 
Married couple family - - 3.34 8.19 —4.85 
Children present = = 1.20 5.67 —4.47 
Constant 89.86 32.48 83.06 17.01 66.05 
White advantage (S) 57.38 66.05 50.72 


Source: Summary File 3, Houston, Texas, 2000. Sample N: 627,613 for Whites and 195,928 for 


Blacks. All coefficients are statistically significant at 0.001 or better 


value of 57.38. This value of S was reported previously in Tables 5.2, 5.3 and 9.5 
and thus confirms the equivalence of the different approaches to assessing 


segregation. 


The third and fourth columns report multivariate regressions separately for 
Whites and Blacks. Each equation has three independent variables — non-poverty 
status, married couple family, and presence of children — which have been coded as 
dummy (i.e., 0,1) variables. In this specification, the intercept of the equation can be 
interpreted as the expected group mean on scaled contact with Whites for families 
that are in poverty, are not married couple families, and do not have children resid- 


ing with them. 
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Table 9.7 Group-specific attainment regressions for White-Latino segregation 


Simple comparison Comparison with controls 


Variable Whites Latinos Whites Latinos Net impact 
Residential outcome (y) scored for gini index (G) 
Non-poverty family - - 14.94 15.33 —0.39 
Married couple family = = 16.56 6.33 10.23 
Children present - - 5.28 —2.70 7.98 
Constant 123.72 49.53 93.01 34.41 58.60 
White advantage (G) 74.19 58.60 76.42 
Residential outcome (y) scored for index of dissimilarity (D) 
Non-poverty family = - 12.92 11.46 1.46 
Married couple family - - 9.94 5.32. 4.62 
Children present - - 4.90 -3.76 8.66 
Constant 81.49 23.12 58.46 12.58 45.88 
White advantage (D) 58.37 45.88 60.62 
Residential outcome (y) scored for hutchens index (R) 
Non-poverty family -— - 5.13 7.18 —2.05 
Married couple family = = 5.10 2.56 2.54 
Children present = = 1.69 -0.73 2.42 
Constant 63.37 35.26 53.38 28.10 25.28 
White advantage (H) 28.11 25.28 28.19 
Residential outcome (y) scored for theil index (H) 
Non-poverty family - - 6.60 9.33 -2.73 
Married couple family = = 6.01 3.34 2.67 
Children present = = 2.91 -1.01 3.92 
Constant 72.95 37.50 60.49 28.24 32.25 
White advantage (H) 35.45 32.25 36.11 
Residential outcome (y) scored for separation index (S) 
Non-poverty family - - 7.65 11.30 —3.65 
Married couple family - - 6.23 3.97 2.26 
Children present = = 2.12 -1.24 3.96 
Constant 80.12 40.17 67.28 29.01 38.27 
White advantage (S) 40.95 38.27 40.84 


Source: Summary File 3, Houston, Texas, 2000. Sample N is 627,613 for Whites and 294,931 for 
Latinos. All coefficients are statistically significant at 0.001 or better 


The difference between the intercepts of the two equations can be interpreted as 
a White-Black segregation comparison that has been “standardized” to control for 
group differences in distributions on social characteristics. That is, the comparison 
reflects group differences on model predicted means on segregation-determining 
residential outcomes for White and Black families that are matched on social 
characteristics. For both Whites and Black the level of average contact with Whites 
for the subgroup reflected at the intercept is lower than the group’s overall mean. 
The value for Whites is 83.06 which is 6.80 points lower than the value of 89.86 
reported in the “constant only” equation for Whites. The value for Blacks is 17.01 
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Table 9.8 Group-specific attainment regressions for White-Asian segregation 
Simple comparison (Comparison with controls 
Variable “Whites Asians Whites Asians Net impact 
Residential outcome (y) scored for gini index (G) 
Non-poverty family - - =11.69 5.97 -17.66 
Married couple family = = 0.91 4.11 -3.20 
Children present = = -2.67 =0.11 -2.56 
Constant 106.22 29.94 117.91 21.12 96.79 
White advantage (G) 76.28 96.79 73.37 
Residential outcome (y) scored for index of dissimilarity (D) 
Non-poverty family - - —4.98 3.09 —8.07 
Married couple family = = 2.72 3.95 -1.23 
Children present = = -0.17 —1.29 1.12 
Constant 75.14 16.93 771.72 11.55 66.17 
White advantage (D) 58.21 66.17 57.99 
Residential outcome (y) scored for hutchens index (R) 
Non-poverty family - - —5.68 4.38 —10.06 
Married couple family - - 0.17 2.42 -2.25 
Children present = = =1.57 0.27 —1.84 
Constant 70.69 35.74 76.74 29.56 47.18 
White advantage (H) 34.95 47.18 33.03 
Residential outcome (y) scored for theil index (H) 
Non-poverty family = = =2.91 6.71 —9.62 
Married couple family = = 0.74 3.48 —2.74 
Children present = = —0.59 0.54 —1.13 
Constant 83.47 52.16 85.91 42.82 43.09 
White advantage (H) 31.31 43.09 29.60 
Residential outcome (y) scored for separation index (S) 
Non-poverty family - - —0.72 8.86 =9.58 
Married couple family - - 0.81 4.18 3.37 
Children present = = 0.00 0.97 —0.97 
Constant 93.79 69.91 93.80 57.78 36.02 
White advantage (S) 23.88 36.02 22.10 


Source: Summary File 3, Houston, Texas 2000. Sample N is 627,613 for Whites and 55,746 for 
Asians. Regression coefficients not significant at 0.01 are in gray italics 


which is 15.47 points lower than the value of 32.48 reported in the “‘constant-only” 


equation for Blacks. 


The White-Black difference of 66.05 at the intercept (83.06 minus 17.01) is 
reported in the third column of the “White Advantage” row. (As discussed below, it 
also is reported as a “net impact” in the fifth column.) This value can be understood 
as the impact of race on expected scaled contact with Whites for White and Black 
families that have the specific configuration of social characteristics associated with 
the intercept of the multivariate equation. Thus, it is the White-Black difference on 
average scaled contact with Whites predicted under the model for families coded 
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zero on all three independent variables (i.e., for non-married couple families with no 
children present and in poverty). In the bivariate regressions the impact of race rep- 
resents the level of segregation in the city because it is exactly equal to the segrega- 
tion index score. In the multivariate specification the impact of race can be interpreted 
as the expected level of segregation between Whites and Blacks when group differ- 
ences in distribution on other social characteristics is controlled. 

The group-specific regression coefficients reported in columns 3 and 4 give 
insights into how the three social characteristics included in the regression impact 
the residential attainments that additively determine segregation as measured by 
S. The regression coefficients for this analysis indicate that all three variables — non- 
poverty status, married couple status, and presence of children — have positive 
effects on family attainments of the residential outcome of scaled contact with 
Whites. This pattern is generally consistent across the multivariate regression analy- 
ses reported for all three White-Minority comparisons and for all five segregation 
indices considered. The effect of non-poverty status is positive and statistically sig- 
nificant in all equations. The effect of married couple status is positive and statisti- 
cally significant in almost all equations. The effect of presence of children is less 
consistent. In the analyses for White-Black segregation it is positive and statistically 
significant in all of the equations but is small in size for Whites in analyses for some 
measures of segregation. In the analyses for White-Latino segregation the effect is 
positive for Whites and negative for Latinos. In the analyses for White-Asian segre- 
gation it is mixed in terms of direction but consistently small (absolute value under 
1.0 in 7 of 10 possible cases). 

The question of how these social characteristics impact segregation is answered 
by examining whether their effects ultimately reduce White-Minority differences on 
segregation-determining residential outcomes. For example, in the analyses for 
White-Black segregation moving from poverty to non-poverty status increases 
Black contact with Whites by 9.59 points. The comparable effect for Whites is 3.58. 
The “net impact” (i.e., White-Black effect difference) is —6.01 points and is reported 
in column five. This has direct implications for segregation. Specifically, it indicates 
that if one starts with White and Black families in poverty that are matched on other 
social characteristics and then move these families from poverty to not in poverty it 
would reduce segregation by 6.01 points. As a quick methodological aside, this “net 
impact” interpretation is based on using a linear, additive regression specification. 
Moving to a nonlinear and/or non-additive model for estimating effects of non- 
racial characteristics would require a more nuanced approach to assessing effects.'° 

In the analysis of White-Black segregation as measured by S the “net impact” 
(i.e., White-Black effect difference) is negative for all three social characteristics 
considered. Thus, in the same sense that the “net impact” indicates that moving 
from “poverty” to “non-poverty” reduces segregation by 6.01 points, moving from 
“non-martried couple” to “married couple” on family type reduces segregation by 


10 Specifically, in nonlinear and/or non-additive models, the impact of a change from poverty to 
non-poverty would have to be assessed separately for each the initial configuration of the other 
social characteristics. 
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4.85 points and moving from “without children” to “with children” reduces segre- 
gation by 4.47 points. In the context of the linear, additive model used here, imple- 
menting all three “net impact” effects simultaneously would reduce the expected 
“White Advantage” in contact with Whites by 15.33 points; it would move the 
“White Advantage” from 66.05 at the intercept — that is, for the White-Black com- 
parison standardized to non-married couple families without children and in pov- 
erty — to 50.72 for the White-Black comparison standardized to non-poverty, 
married couple families with children. This is reported in column five of the “White 
Advantage” row in the results. 

These results help clarify how the impacts of racial group membership and non- 
racial social characteristics on segregation can be investigated in a more careful and 
nuanced way. The “net impact” reported in column five provides insight into the 
proximate impact of group differences on social characteristics on segregation. The 
regression coefficients reported in columns three and four clarify how the “net 
impact” comes about. Including the group-specific regression constants in the dis- 
cussion provides a basis for comparing how the additive and non-additive effects of 
race compare with the effects of other factors in shaping segregation. 

In the multivariate framework a wide range of logical possibilities can be imag- 
ined. At one extreme all block groups could have identical values on pairwise pro- 
portion White. In this possible but unlikely scenario of exactly zero race segregation 
the regression coefficients for non-racial social characteristics would be zero in both 
group equations and the intercepts of both equations would be identical. Another 
possibility is that race segregation is present and is due only to simple additive 
effects of race. In one scenario for this pattern, the regression coefficients for non- 
racial social characteristics are zero in both group equations but the intercept is 
higher in the equation for Whites and lower in the equation for Blacks. In a more 
complex scenario, the group equations differ at the intercept as just described and 
the regression coefficients for other social characteristics are not zero but are identi- 
cal for both groups and both groups have identical distributions on the social 
characteristics. 

A more plausible scenario is that race segregation is present and is produced by 
a complex combination of contributing factors including the following: additive 
race effects (i.e., differences at the intercepts of the attainment equation), non- 
additive race effects (i.e., race differences in the effects of non-racial characteris- 
tics), race differences in distribution on non-racial social characteristics, and the 
“interaction” of the last two factors. The results in Table 9.6 provide evidence that 
additive race effects are the most important factor contributing to segregation. The 
“White Advantage” of 66.05 reported in column three is one estimate of the quanti- 
tative contribution. This value can be described as the impact of race on segregation- 
determining residential outcomes for non-married couple families without children 
who are in poverty. That is, it is the value that would be estimated for the effect of 
being White (coded 1 if White and 0 if Black) if the regression analyses reported in 
columns 3 and 4 were replicated in a single equation specification using the com- 
bined samples. 
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The value of the intercept enters into all predictions and in this model specifica- 
tion the intercept corresponds to a set of families with a specific profile on social 
characteristics. So it is fair to describe the observed race difference at the intercept 
as applying “across the board” since reflects the expected level of segregation when 
social characteristics are fixed. Of course, the specific value of the intercept can vary 
depending on how variables are coded. So it is reasonable to ask whether the value 
of 66.05 is a fair or representative choice among all of the possible estimates of 
expected segregation for White-Black comparisons matched on social characteris- 
tics. The model predictions provide one answer to that question. Since all net impact 
calculations in column 5 are negative, 66.05 is the maximum race difference the 
attainment models will predict for White and Black families matched on all social 
characteristics. In contrast, the race difference of 50.72 predicted for the White- 
Black comparison for non-poverty, married-couple families with children present is 
minimum difference the attainment analysis will predict for White and Black fami- 
lies matched on all social characteristics. 

This is useful information to consider. One can also apply predictions from the 
model to a hypothetical “standard” distribution of social characteristics to obtain 
expected White-Black differences on segregation-determining residential outcomes 
for “matched distributions.” Results for this kind of standardization analysis were 
reported in Table 9.4 based on adopting the combined group distributions as the 
“standard” for matching Whites and Blacks on social characteristics. The White- 
Black difference obtained under this calculation was 54.38, which necessarily falls 
between the minimum and maximum predicted differences of 50.72 and 66.05. 

The question at hand here is how the effect of race on segregation compares to 
the effect of other social characteristics. A range of estimates of the impact of race 
are on the table. The “net impact” estimates in column 5 provide one basis for 
assessing the impact of other social characteristics on segregation. The separate net 
impact estimates range from —4.47 to —6.01 and are small compared to the race 
effect. The impact of non-poverty status is the largest of the three values and its 
magnitude is less than 12 % of the lower-bound estimate of the additive impact of 
race. If one combines the impacts of all social characteristics to obtain the maxi- 
mum possible combined effect on reducing segregation, the result is 15.33 which is 
about 30 % of the lower-bound estimate of the additive impact of race. On this basis, 
one can argue that race is clearly the dominant factor impacting residential out- 
comes that determined segregation. Poverty status, family type, and presence of 
children do impact segregation. But their effects on segregation are small compared 
to the broad effect of race. 

Standardization and decomposition analysis can provide additional perspective 
on the role non-racial social characteristics play in shaping segregation. For exam- 
ple, Table 9.1 reported that 80.9 % of Black families were in not in poverty com- 
pared to only 95.8 % of White families. If the non-poverty rate for Black families 
was increased to match the rate of observed for White families, the model indicates 
segregation would be reduced by 1.43 points. This is less than 3 % of the lower- 
bound estimate of the effect of race. From many different vantage points, the analy- 
sis consistently indicates that White-Black differences in distribution on social 
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characteristics are not the major factor in determining segregation; the vast majority 
of segregation is due to expected mean differences on segregation-determining out- 
comes — in the case of S, pairwise contact with White — between Whites and Blacks 
matched on social characteristics. 

Similar findings emerge in the analyses of residential outcomes relevant for 
determining the separation index (S) for White-Latino segregation (reported in 
Table 9.7) and for White-Asian segregation (reported in Table 9.8). In both cases, 
the net impact calculation for race based on the multivariate analysis of segregation- 
determining residential attainments (i.e., the value of White advantage reported in 
column 5) is much larger than the net impact calculations for the other social char- 
acteristics included in the analysis. The same general finding holds up across all 
three White-Minority segregation comparisons in analyses focusing on residential 
attainments relevant for G, D, R, and H. That is, the net impact of race on index- 
specific, segregation-determining residential outcomes is consistently much larger 
than the net impact estimates for the other social characteristics considered in these 
analyses. 

These general conclusions are appropriate. But close inspection of the detailed 
results reveals interesting differences across White-Minority comparisons and 
across analyses focusing on different segregation indices. For example, in the case 
of White-Black segregation, the net impact calculation for non-poverty status varies 
across indices. Its absolute and relative magnitude is largest in the analysis focusing 
on the separation index (S) and is small and modest in the analyses focusing on the 
gini index (G) and the dissimilarity index (D). This is also true in the case of White- 
Latino segregation. But the pattern is different in the analysis results for White- 
Asian segregation. Here the net impact calculation for non-poverty status is sizeable 
for all indices and largest of all in the results for the gini index (G). 

I conclude this section by noting that other interesting results can be discovered 
by making comparisons across groups. For example, the combined net impact cal- 
culations for married couple status and children present serve to reduce White- 
Black segregation across analyses for all segregation indices. A very similar pattern 
is also found in the results for the analyses of White-Asian segregation. But a much 
different pattern is seen in the results for the analyses of White-Latino segregation. 
The combined net impact calculations for married couple status and children pres- 
ent serves to increase segregation in the analyses for all segregation indices with the 
magnitude of the combined impact being especially large in the case of the gini 
index (G) and the dissimilarity index (D). These intriguing results and highlight 
how the new approach opens the door for pursuing more careful exploration of the 
social processes that produce White-Minority segregation. Future research may pro- 
vide insight into why family structure appears to play a different role in White- 
Latino segregation in comparison with White-Black and White-Asian segregation. 
These and other possibilities for future analysis highlight the advantages of adopting 
the difference of means framework and embracing its capabilities for exploring seg- 
regation patterns in more detail. 
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9.9 Unifying Aggregate Segregation Studies and Studies 
of Individual-Level Residential Attainment 


For many decades, dating back at least to the late 1960s, studies of segregation have 
followed one path while studies of racial and ethnic inequality and disparity on 
socioeconomic outcomes such as education, occupation, income, wealth, and home 
ownership have followed a different path. In the broader literature on racial socio- 
economic inequality and disparity it is conventional to see racial disparities on 
socioeconomic attainment outcomes (e.g., education, income, etc.) as emerging 
from micro-level processes of attainment. Accordingly, research focusing on inter- 
group inequality and disparity on most socioeconomic outcomes draws on micro- 
level attainment models to understand and analyze group differences on 
socioeconomic attainments. 

This has not been the case in the study of residential segregation. To be fair, 
researchers understand that at some level residential segregation arises from micro- 
level processes wherein individuals and groups seek, compete for, and attain (or fail 
to attain) particular residential outcomes. But past statements on segregation mea- 
surement have focused almost exclusively on the task of aggregate-level descrip- 
tion. Relatively little attention has been given to developing connections between 
index scores for uneven distribution and residential outcomes for individuals and 
families that are considered in studies of residential attainment. 

I noted earlier that Duncan and Duncan lamented this fact observing that the lit- 
erature on segregation indices provided no “suggestion about how to use them to 
study the process of segregation” (1955:216, emphasis in original). Unfortunately, 
the negative assessment they offered more than five decades ago applies with equal 
force today. Research clarifying how micro-level attainment dynamics give rise to 
aggregate segregation as measured by popular indices of uneven distribution is not 
well-developed. In my view, the point of concern that Duncan and Duncan raised 
has taken on much greater importance in the five decades that have passed since 
their study. In general, research on racial and ethnic differences in socioeconomic 
outcomes has advanced considerably based on steady, cumulative improvements in 
our understanding of how group differences in aggregate attainments arise from 
micro-level attainment dynamics. But this has not been the case in the subfield of 
segregation research. Until now there has been little progress in developing a better 
understanding of how aggregate level segregation (as measured by indices of uneven 
distribution) is linked with individual-level residential outcomes and the micro-level 
processes that shape them. 

Of course, there is a large and vital literature that investigates micro-level dynam- 
ics of residential attainment. Studies using individual-level data to focusing on 
spatial assimilation and spatial attainment first appeared in the 1980s (e.g., Massey 
and Mullan 1984; Massey and Denton 1985) and then with increasing frequency in 
the 1990s and beyond (e.g., Alba and Logan 1993; Alba et al. 1999; Bayer et al. 
2004; Crowder and South 2005; Crowder et al. 2006; Logan et al. 1996; South and 
Crowder 1997, 1998; South et al. 2005a, b; South et al. 2008). But, as valuable as 


9.9 Unifying Aggregate Segregation Studies and Studies of Individual-Level... 171 


this literature has been, it has remained fundamentally disconnected from the litera- 
ture investigating segregation at the aggregate level. The reason for this is simple; 
the literature on segregation measurement has never provided a simple, direct strat- 
egy for connecting segregation at the aggregate level (i.e., for a city) to individual 
residential attainments. 

Casting indices of uneven distribution as group differences in means on indi- 
vidual residential outcomes addresses this gap in segregation studies. It establishes 
a simple, direct connection between individual residential outcomes and segrega- 
tion index scores and in doing so creates the possibility of unifying studies of aggre- 
gate segregation and studies of residential attainment in a common overarching 
framework. Specifically, this approach allows researchers to simultaneously investi- 
gate both individual residential attainments and aggregate segregation in a single 
analysis. I noted earlier in this chapter that aggregate segregation now can be under- 
stood as the effect of group membership (coded 0-1) on the relevant residential 
outcome in a simple bivariate regression model of individual residential attain- 
ment.!! But this is only a starting point for analysis, not an end point. The approach 
can be readily extended in a variety of ways that move the investigation of segrega- 
tion beyond simply assessing aggregate-level uneven distribution. 

Casting segregation as a difference of means on individual residential outcomes 
puts the investigation of segregation on the same methodological footing as the 
investigation of inter-group inequality and disparity on other important socioeco- 
nomic outcomes such as education and income. The key to this is that group dispar- 
ity is conceived and modeled as emerging directly from an individual-level 
attainment process. This fundamental change in conceptualization opens up impor- 
tant new options for research. For example, it makes it possible to assess the role of 
social characteristics such as income using fine-grained measurement such as con- 
tinuous measurement of income instead of crude category distinctions as used in 
current practice. Even more importantly, it makes it possible to take account of 
multiple social characteristics in analyses investigating group segregation; some- 
thing that is difficult if not impossible to implement using standard methodological 
approaches to investigating segregation. 

These new options become possible because multivariate modeling of individual 
residential outcomes provides a superior — specifically, a statistically more effi- 
cient — framework for taking account of the role of multiple social characteristics 
(including both race and non-racial characteristics). In this context, implications for 
aggregate-level segregation can be assessed using methods that are widely used in 


''The analysis can be conducted using conventional OLS regression or analysis of variance. 
Statistical tests that rest on the assumption of normality and equal variances for the error term 
across groups may be questionable on technical grounds in some cases. But the typically large 
sample sizes used in segregation studies will minimize concerns about these issues. In any event, 
many good statistical alternatives are available. Boot-strapping or other methods may be used to 
perform statistical tests that do not rely on assumptions regarding normality and equal variances. 
Alternatively, the effect of group membership can be assessed using more technically appropriate 
modeling frameworks such as fractional regression (Papke and Wooldridge 1996) or beta regres- 
sion (Smithson and Verkuilen 2006; Buis 2006; Buis et al. 2006). 
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the study of racial inequality and disparity in other socioeconomic outcomes. For 
example, regression standardization methods can be used to examine differences in 
residential outcomes for groups that are statistically matched on relevant social 
characteristics (i.e., other than group membership). Similarly, components analysis 
can be used to assess the contributions to aggregate segregation of group differences 
in attainment resources and group differences in ability to convert resources into 
attainments. These and related methods provide valuable new options for gaining a 
better understanding of the factors that produce segregation and new options for 
exploring the potential of different policies to impact aggregate segregation. 

Regression-based analysis carries advantages on all these points. In general, the 
advantages derive from the fact that multivariate regression analysis is a more sta- 
tistically efficient method with which to account for the effects of multiple social 
characteristics when comparing groups on average attainments on residential out- 
comes. Specifically, the statistical efficiencies of the regression standardization 
approach make it feasible to: (a) incorporate multiple non-racial social characteris- 
tics in the analyses and obtain reliable estimates of their separate effects on relevant 
residential attainments, (b) model the role of continuous social characteristics (e.g., 
income) in as much detail as the tabulations (or, as will be discussed below, micro- 
data) will permit, (c) perform comparisons in cities where the small relative size of 
the minority population makes application of previous approaches problematic, and 
(d) perform significance tests of the role of race (i.e., group membership) on resi- 
dential outcomes with social characteristics controlled." 

The empirical examples reviewed here provide preliminary illustrations of how 
the new methods can be used to good effect. But the next section shows that the 
examples introduced above only hint at what is possible. The new methods used in 
these examples permit one to imagine new options for analysis using micro data that 
can go far beyond what might be accomplished using traditional approaches for 
incorporating non-racial social characteristics into segregation analyses. 


9.10 New Possibilities for Investigating Segregation Using 
Restricted Data 


The methods introduced in this chapter permit researchers to investigate segregation 
in more detail than was previously possible. But the potential benefits of the new 
methods are relatively modest when segregation is investigated using publicly dis- 
tributed census summary file tabulations. Summary file tabulations have been the 
“life blood” of segregation research to date. They have sustained traditional 
approaches to investigating residential segregation and, at least to some degree, they 
also can sustain analyses of individual residential attainments of the kind just 
reviewed. But public summary file tabulations have major limitations. For example, 


12 As discussed in an earlier note, this is based on performing pooled regression analyses to test 
additive and non-additive effects of race with non-racial social characteristics controlled. 
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tabulations rarely include more than a few non-racial social characteristics at one 
time, tabulations often provide only limited detail on non-racial social characteris- 
tics, and researchers have no control over the sample universe for the tabulations." 

The new methods outlined here can help researchers get more out of these tradi- 
tional sources of data for segregation analysis. But the potential benefits of the new 
methods can be realized more fully and to greater effect if one draws on a new 
source of data for performing segregation analysis. The new source is restricted 
census datasets that contain individual-level data with detailed information about 
both individual social characteristics and also geographic information needed to 
pursue analyses of the residential attainment processes that produce segregation.'* 
Working with restricted access census files is difficult, time consuming, and expen- 
sive. But it also affords great opportunities. For example, it is conceivable that one 
could use the most recent files from the American Community Survey (ACS) or the 
American Housing Survey (AHS) to investigate segregation without having to rely 
on summary file tabulations. This is possible because the difference of group means 
formulation of segregation indices allows segregation scores to be estimated by the 
effect of race in city-specific individual-level models of residential attainment. So, 
if one has access to detailed micro data, one has tremendous flexibility to investigate 
segregation in a wide range of new ways. 

Additionally, because this approach allows for more efficient multivariate analy- 
sis, it expands the possibilities for investigating segregation reliably with smaller 
samples.'> This is not only relevant for using smaller samples such as are found in 
the ACS and AHS. It also raises the possibility of investigating segregation using 
non-census surveys.'® This is intriguing because non-census surveys can permit 
investigators to expand residential attainment analyses to consider variables such as 
individual racial attitudes, residential preferences, and other relevant measures that 
are not available in census datasets whether micro-data files or summary 
tabulations. 


For example, the attainment analyses I reported in the previous section found weak income 
effects based on a crude poverty/non-poverty distinction. Stronger income effects can be discerned 
using detailed income tabulations, but these tabulations do not include the other social characteris- 
tics in the analysis. 

14 A study by Bayer et al. (2004) takes a step in this direction by using restricted access census data 
to conduct refined individual-level analyses of residential contact. The framework I set forth here 
makes it possible to implement this kind of study to investigate uneven distribution. 


'S Significance tests and confidence intervals for the effect of race on residential attainments pro- 
vides clear information about the reliability of segregation estimates obtained from analyses using 
smaller samples. 

16 Non-census surveys such as the Multi-City Study of Urban Inequality (MCSUID) can be used to 
study refined models of segregation so long as the households in the study are coded for area of 
residence at census geographies relevant for studying segregation (e.g., tract, block group, or 
block). Residential outcomes scored from census data can then be merged with the survey data to 
permit refined micro-level analyses of aggregated segregation. 
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A series of recently completed studies by Amber Fox Crowell provides insight into 
what the future of research on residential segregation is going to look like.'’ The 
primary focus of her research is on the factors determining White-Latino segrega- 
tion. Her dissertation research (Fox 2014) presents detailed analyses investigating 
White-Latino in six major metropolitan areas. The analyses draw on restricted 
micro-data files of the 2000 decennial census and the restricted micro-data files of 
the 2008-2012 American Community Survey. Crowell applies the methods dis- 
cussed in this work to the full potential that can be achieved with extant data. She 
measures residential outcomes at the level of census blocks and performs sophisti- 
cated quantitative analyses using the method of fractional regression to assess the 
impact of social and economic characteristics on White and Latino residential 
attainments. She then performs standardization and components analysis to assess 
the role of group differences in social and economic characteristics in explaining 
White-Latino residential segregation. Her studies present detailed results for analy- 
ses pertaining to segregation measured both using the separation index (S) and the 
dissimilarity index (D). I limit the presentation here to selected results from her 
analyses focusing on group separation (S) but note that the results for the dissimilar- 
ity index are similar in overall pattern. 

The most striking contribution of her research is her ability to investigate how a 
comprehensive set of social and economic characteristics shape residential out- 
comes for Latino households. The list of micro-level predictors and the estimated 
coefficients indicating their impact on the residential attainments of Whites and 
Latinos in Houston, Texas in 2000 and in 2010 is presented in Table 9.9. Results for 
other cities are not presented to conserve space, but the results for Houston give the 
full flavor of the analyses Crowell is able to conduct. Her attainment equations 
include a wide range of relevant predictors including age, level of education, house- 
hold income, military service, nativity and citizenship, year of immigration, English 
ability, marital/family status, and recent immigration experiences. No previous 
study has ever been able to take all of these factors into account simultaneously to 
quantitatively assess their impact on overall (city-level) residential segregation. 

The results reported in Table 9.9 show that all of the micro-level variables have 
statistically significant effects in both the equation for Whites and the equation for 
Latinos. The “centered” constant reported in the table is the expected value of 
contact with Whites when independent variables are set at reference categories (for 
categorical variables) or values (for interval variables). The coefficients reported are 
fractional effects. These are additive effects on the logit value of the mean for con- 
tact with Whites. Positive effects are seen for education, income, and English lan- 
guage ability, produce greater average contact with Whites. Negative effects are 


'’The studies originate with analyses reported in Dr. Crowell’s dissertation (Fox 2014) and elabo- 
rated and extended in presentations at professional meetings (Crowell and Fossett 2015; Fox and 
Fossett 2014a, b). 
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Table 9.9 Coefficients from fractional regressions predicting residential outcomes (y) determining 
the separation index (S) for White-Latino segregation in Houston, Texas in 2000 and 2010 


Whites Latinos 
Variable 2000 2010 2000 2010 
Degree (0-5) 0.1907" 0.1395* 0.2794" 0.22937 
Income (Ln) 0.0998° 0.0666° 0.0771° 0.0535? 
Military —0.0990° —0.1103# 0.1088 0.0828 
U.S.-born citizen (ref) - -= - - 
Non-U.S. citizen —0.0981° —0.0873 —0.2506* —0.2318* 
Nat. U.S. citizen —0.099 1* —0.1312? —0.0315 —0.0280 
Recent immigrant —0.1836* —0.0877 —0.1874* —0.0042 
English ability 0.2923 0.3097" 0.1679 0.2526* 
Age 30-59 (ref) - - - - 
Age 15-29 —0.1713" —0.1673" —0.1902? —0.1950* 
Age 60+ 0.15798 0.13868 —0.0025 0.0843" 
Married couple (ref) - - - - 
Single mother —0.287 1 —0.3010* —0.1655* —0.2784* 
Other family —0.3715* —0.3325" —0.0489* —0.1283* 
Recent mover 0.09408 —0.08 14" 0.2334" 0.0530° 
Constant —0.6652* —0.6285" —1.7437° —1.8607° 
Constant (centered) 1.5908° 1.2448* 0.0903? —0.1099* 


Notes: ‘denotes p<0.01 and denotes p<0.05 
Source: Restricted microdata files from the 2000 decennial census and the 2008-2012 American 
Community Survey 


seen for foreign born status, non-citizen status, and recent immigration which all 
produce lower average contact with Whites. All of the effects are consistent with 
expectations from spatial assimilation theory. Group differences in the efficacy of 
the social and economic characteristics reflect the impact of minority status on con- 
tact with Whites. Altogether the results provide a wealth of information about the 
role of social and economic characteristics in shaping White and Latino residential 
outcomes and ultimately White-Latino segregation. 

The implications of the results for White-Latino segregation in Houston are sum- 
marized in Table 9.10 which also presents results for the other cities included in 
Crowell’s analyses. The results document that White-Latino differences in mean 
contact with Whites — the residential outcome that determines the value of the sepa- 
ration index (S) — vary across substantively relevant standardization scenarios. The 
scenario labeled “Latino group means & Latino rates of return” yields the predicted 
level of contact with Whites for Latinos in the Houston given their observed 
distribution on the social and economic characteristics in the attainment equations. 
Similarly, the scenario labeled “White group means & White rates of return” yields 
the predicted level of contact with Whites for Whites in the Houston given their 
observed distribution on the social and economic characteristics in the attainment 
equations. The difference between these two means yields the observed value of the 
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separation index (S) for White-Latino segregation in Houston. That is, the value of 
42.1 in 2000 reflects the difference between the mean of 85.3 for Whites and the 
mean of 43.2 for Latinos and is reported in the column labeled “S*” under Houston 
on the first row of the panel reporting results for 2000. 

Scanning the values reported on this row of the table reveals that White-Latino 
separation varies greatly across the six cities in Crowell’s analysis. The separation 
index (S) is highest in Los Angeles (51.7) and only slightly lower in Houston (42.1) 
and Chicago (40.4). It is somewhat lower in Atlanta (23.9) and very low in Seattle 
(8.4). Drawing on methods reviewed earlier in this chapter, Crowell performed stan- 
dardization analyses to explore address the question of whether White-Latino seg- 
regation is due to group differences in resources for residential attainment or the 
impact of group status itself in the residential attainment process. In the interests of 
space group distributions on predictors are not shown but they are reported in 
Crowell’s studies. Not surprisingly, Latinos tend to have deficits on predictors that 
have positive effects on contact with Whites (e.g., income) and surpluses on predic- 
tors that reduce contact with Whites (e.g., non-U.S. citizen). 

The role of group differences in resources is documented in the row labeled 
“White group means & Latino rates of return.’ The values reported here indicate 
how Latino residential outcomes would change if Latinos had the White “profile” 
on social and economic characteristics. The implications for S* show that the role 
of group differences assessed in this manner is always positive and substantively 
important. Equalizing Latino inputs to residential attainment process reduces the 
value of S by between 34 and 61 %. 

The role of minority status is documented in the row labeled “Latino group 
means & White rates of return” which indicates how Latino residential outcomes 
would change if Latinos experienced White rates of converting inputs to the attain- 
ment process into contact with Whites. The implications for S* show that the role 
of this factor also is always positive and substantively important. Indeed, equalizing 
Latino rates of return in the attainment process would reduce the value of S by 
between 74 and 89 %. 

I close this chapter by noting that the results presented in Tables 9.9 and 9.10 
provide a wealth of information warranting additional discussion. Unfortunately, a 
more detailed review is beyond the scope of the present discussion so I encourage 
interested readers to seek out Crowell’s research for more in-depth discussion of her 
findings. The central point I stress here is this. Crowell’s research shows that com- 
bining the new methods outlined in this monograph with the restricted census 
micro-data files opens the door to exciting new options for segregation analysis. 
Crowell’s research provides the best example to date of how segregation can be 
analyzed in great detail in a single-city analysis. In the next chapter I outline how 
this approach can be expanded to cover a larger sample of cities and explore the 
impact of city-level characteristics on residential segregation via estimation of 
multi-level models of residential attainments. 
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Chapter 10 
New Options for Investigating Macro-level 
Variation in Segregation 


The previous chapter established that the difference of means framework for mea- 
suring segregation makes it possible to investigate segregation in a single city using 
individual-level models of residential attainment. The discussion in this chapter 
reviews how this approach can be extended to investigate ecological (i.e., aggregate- 
level) variation in segregation across cities and over time using multi-level models 
of individual residential attainments. The key is that ecological variation in segrega- 
tion can be investigated by assessing how the effect of race on segregation-relevant 
individual residential outcomes is conditioned by time and/or city characteristics. A 
central advantage of this approach is that it permits researchers to also include rel- 
evant non-racial social and economic characteristics in the micro-model. This 
allows effects of community characteristics to be estimated at the “zero order” level 
or “net” of controls for non-racial factors. It also can help overcome the risk of 
errors of inference that are likely to occur in aggregate-level analyses that attempt to 
control for relevant individual-level social and economic characteristics using 
aggregate-level indicators of group disparity on these variables. 


10.1 New Specifications for Conducting Comparative and/or 
Trend Analyses of Segregation 


Investigations of how segregation varies across metropolitan areas and over time are 
a staple of segregation studies. The new methods outlined here can be used to first 
exactly replicate earlier studies and then to extend them in new ways. Results from 
previous studies can be exactly replicated by estimating contextual and multi-level 
models where variation in segregation over time and across metropolitan areas is 
captured by assessing how the effect of race in individual-level models of residen- 
tial attainment varies with time and/or the ecological characteristics of metropolitan 
areas. For example, consider the question of how White-Black segregation 
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measured by the index of dissimilarity (D) varies across cities based on city size 
(npop=the natural logarithm of total population) and relative minority size 
(rpb =the square root of proportion Black for the city population). 

Following the typical aggregate-level approach, cities are taken as units of analy- 
sis, D is calculated separately for individual cities, and scores for D then are taken 
as dependent variables (y) in a city-level OLS regression analysis that includes city 
size (Inpop) and relative minority size (rpb) as predictors as follows. 


y; =a, +a, (Inpop)+a, (rpb) 


I estimated this equation using city-level data for the 201 metropolitan areas that 
had 50,000 total households and 2000 Black households in the 2000 Census.! I 
obtained the following results. 


y; =55.92 + 3.90(Inpop) + 4.79(rpb) 


These effect values also can be obtained from an individual-level, contextual 
model that takes as its dependent variable the individual-level residential outcome 
(y) relevant for D — that is, whether the (pairwise) proportion White (p) for the area 
in which the individual resides is equal to or greater than the city-wide figure (P) — 
and then also includes as a predictor the individual characteristic of race (coded | if 
White and 0 if Black) and appropriate interactions to capture how the effect of race 
on residential outcomes varies by city size (Inpop) and relative minority size (rpb). 
The result is the following individual-level OLS regression specification. 


y; = b, +b, (race) +b, (Inpop)+ b, (rpb) +b, (race-Inpop) + b, (race-rpb) 


I estimated this equation using individual-level-level data for the White and Black 
households in the same set of 201 metropolitan areas used in the aggregate analysis 
just reported above and I obtained the following results.” 


y; = 17.58 + 55.92(race) + —1.671 (Inpop) + 12.75 (rpb) 
+ 3.90(race . Inpop) +4.79 (race . rpb) 


! The data for the analyses are obtained from the tabulation of household income by race for census 
block groups in Tables 15.1 (A-I) distributed in Summary File 3 of the 2000 Census. 


?One must give attention to how cases are weighted to replicate the unstandardized regression 
coefficients from the city-level regression. The city-level regression implicitly gives equal weight 
to each group’s mean for the relevant residential outcomes (y) in every city. To implement the same 
weighting scheme at the individual level, I first calculated each household’s proportionate share of 
the race-specific group total for the city in which it resided. I then multiplied these share values by 
2000, the minimum number of households for any group in any city, and used the resulting number 
as the case weight for the household. One may consider other weighting approaches at the indi- 
vidual level. But something along the lines of the approach just noted is required to exactly repro- 
duce the city-level regression coefficients. 
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The results document that the additive and non-additive effects of race in the individual- 
level contextual regression correspond exactly to the coefficients in the city-level 
regression. Specifically, b,=a)=55.92 , b,=a,= 3.90, and b,=a,=4.79. 

I present the city-level and household-level regressions in Table 10.1. The table 
also includes regressions for the additive components that define D; namely, the 
percentages of Whites living in areas where proportion White exceeds the city pro- 
portion and the comparable percentage of Blacks. Inspection of the results shows 
that the effects of city size (Inpop) and relative minority size (rpb) on D can be 
traced to the differential effects these city characteristics have on the levels of resi- 
dential contact White and Black households have with Whites. 

The table also provides a parallel analysis for the separation index (S). As seen 
for the analysis for D, coefficients in the city-level regression for S map exactly onto 
coefficients in the individual-level contextual regression and the separate regres- 
sions and the results for S can be traced to the differential effects city characteristics 
have on the residential contact Whites and Blacks have with Whites. 

I next extended the analysis to do something that previously has not been possi- 
ble — namely, to investigate variation in segregation across cities and/or over time 
while simultaneously taking account of non-racial characteristics of households at 
the micro-level. This is possible because the summary file tabulations — namely, 
Table 151 (A-I) from Summary File 3 of the 2000 Census — provide the individual 
level data needed to perform this analysis. To accomplish the task, I re-estimated the 
contextual regressions reported in Table 10.1 after adding household income and the 
interaction of race and household income as predictors in the analysis. The results 
are presented in Table 10.2. 

The impact of race on White-Black differences in residential outcomes and how 
these differences vary with city characteristics are registered in the same way as 
before. But here the segregation effects — that is the effect of race on residential 
outcomes — can be interpreted as being estimated net of the effects of that income 
has on residential outcomes. In the model specification used here, higher income is 
seen to lead to greater contact with Whites, a finding consistent with results reported 
in the literature on residential attainment. But note that the introduction of the con- 
trol for income at the individual level has little impact on the effects of city size and 
relative minority size on segregation. That is, the impacts of city size and relative 
minority size on the coefficient for race in this analysis closely parallel the same 
effects observed for these variables in the city-level and individual-level contextual 
regressions that do not include individual income as a control variable. 

I do not present city-level regressions in Table 10.2 because aggregate specifica- 
tions cannot properly take account of the role of group differences in socioeconomic 
characteristics. I have outlined the general basis for this conclusion in an earlier 
paper focusing on the logically similar task of assessing the role of group differ- 
ences in education in city variation in racial income inequality (Fossett 1988). The 
conclusions of that methodological study apply with full force to the present situa- 
tion. That is, to correctly assess how group differences in socioeconomic attain- 
ments impact city variation in segregation, one must draw on data that disaggregates 
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Table 10.1 Regression results illustrating that effects in city-level analyses of segregation and 
contact can be obtained using individual-level, contextual regressions predicting group differences 
in contact 


City-level regressions for dissimilarity index (D) D D contact D contact 
Pwo>p) Papsp) 
City-level effects on segregation (group contact difference) and group contact terms‘ 


City size (Inpop) (a;, b4) 3.90 223° —1.67° 

Relative minority size (rpb) (az, bs) 4.79 17.548 12.75* 

City-level intercept (ao, b;) 55.928 73.508 17.588 
Sample N 201 201 201 
Individual-Level, Contextual Regressions for y for D y for D y for D 
Dissimilarity Index (D) Pooled Whites Blacks 
City-level effects on segregation (group contact difference) and group contact terms‘ 

City size (Inpop) (a), b4) 3.90 2.23" —1.67° 

Relative minority size (rpb) (a2, bs) 4,798 17.548 12.75* 

City-level intercept (ao, b;) 55.92* 73.50* 17.58 
Additional individual-level effects 

City size (main effect Inpop, b2) -1.67" - - 

Relative minority size (main effect rpb, b;) 12.75 — — 

Individual-level intercept (bo) 17.58° — — 
Sample N 804,000° 402,000° 402,000° 
City-level regressions for separation index (S) S S Contact S Contact 

P*ww P*gw 

City-level effects on segregation (group contact difference) and group contact terms‘ 

City size (Inpop) (a;, b4) 5.83? 0.59° —5.24° 

Relative minority size (rpb) (a, bs) 70.51 —35.39* —105.90* 

City-level intercept (ao, b;) 11.76* 99.78" 88.02 
Sample N 201 201 201 
Individual-level, contextual regressions for y for S y for S y for S 
separation index (S) Pooled Whites Blacks 
City-level effects on segregation (group contact difference) and group contact terms‘ 

City size (Inpop) (a;, b4) 5.83" 0.59% —5.24* 

Relative minority size (rpb) (a2, bs) 70.51 —35.39* —105.90* 

City-level intercept (ao, b;) 11.76* 99.78 88.02 
Additional individual-level effects 

City size (main effect Inpop, b2) —5.24* - - 

Relative minority size (main effect rpb, b;) —105.90° — — 

Individual-level intercept (bo) 88.02° - - 
Sample N 804,000° 402,000° 402,000° 


(continued) 
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Table 10.1 (continued) 


Source: Summary File 3, Census 2000 

ap < 0.001 

?p < 0.01 

€ Weighting of cases is described in the text 

“Tn the city-level regressions for D and S, the equation is y = ao + a,(Inpop) + a,(rpb) and the city- 
level effects are ao, a), and az. In the individual-level, contextual regressions y is scaled pairwise 
contact based on y = f(p) as appropriate for D and S. The equation is y = bọ+ b,(race) + b.(Inpop) 
+ b;(rpb) + b,(race-Inpop) + b;(race-rpb). City-level effects are captured by coefficients b;=dp, 
b4=a;, and bs=a. The variables Inpop and rbp are centered on values at the observed sample 
minimum, Inpop on 12.0, and rbp on 0.10, to make the regression intercepts substantively 


meaningful 


residential outcomes by race and socioeconomic status as is the case for the 
individual-level contextual regressions in Table 10.2. 

Due to the lack of viable alternative methods, past studies sometimes have 
instead adopted the approach of including aggregate (i.e., city-level) measures of 
group differences in socioeconomic status as control variables in analyses predict- 
ing segregation (e.g., Marshall and Jiobu 1975; Roof et al. 1976; Farley and Frey 
1994; Massey and Denton 1987). Unfortunately, this approach is flawed. As noted 
earlier, it can yield misleading results because it runs afoul of the “aggregate” or 
“ecological” fallacy in using aggregate-level measures to take account of the role of 
variables whose impact should properly be assessed at the micro level. I do not 
provide an extended discussion of the general issues to since I have reviewed them 
in an earlier study (Fossett 1988). But I do highlight the practical significance of the 
problem by reporting analyses in Table 10.3 that replicate central findings reported 
in Fossett (1988) using an empirical example investigating cross-city variation in 
segregation. The first column in Table 10.3 reports results of conventional city-level 
regressions that predict D and S using city characteristics. The second column 
reports results of regressions that add a city-level measure of Black-White income 
inequality as a predictor. Many aggregate analyses of segregation have used similar 
model specifications motivated by the plausible conjecture that segregation between 
groups will be larger when their disparity on income is larger. 

The results of the aggregate regression suggest that group income differences have 
dramatic impacts on segregation. But this finding is contradicted by the results of the 
contextual analyses reported in Table 10.2. It also is at odds results from the city- 
specific standardization exercises for Houston, Texas reported earlier in Table 9.4. 
These analyses controlled for socioeconomic characteristics at the individual level 
and the results indicated that socioeconomic characteristics were not generally 
important in shaping racial segregation. Specifically, these analyses indicated that 
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Table 10.2 Analyses illustrating how city-level analyses of segregation can be conducted using 
individual-level, contextual regressions that control non-racial characteristics 


Regressions for Dissimilarity Index (D) y for D y for D 
City size (Inpop) (a;, b4) 3.90° 3:977 
Relative minority size (rpb) (a2, bs) 4.79° 2.248 
City-level intercept (ao, b;) 55.92 56.35* 

Additional individual-level effects 
City size (main effect Inpop, b2) —1.67* =2:39* 
Relative minority size (main effect rpb, b;) 12:73" 15.62* 
Income (bg) - 3.36* 
Race-income interaction (b7) - -1.16 
Individual-level intercept (bo) 17.58° 11.40° 

Regressions for Separation Index (S) y for S y for S 

Implied city-level effects on segregation & contact® 
City size (Inpop) (a;, b4) 5.83° 6.26° 
Relative minority size (rpb) (a2, bs) 70.51 68.05 
City-level intercept (ao, b;) 11.76° 15.40° 

Additional individual-level effects 
City size (main effect Inpop, b2) 5.24" —5.88° 
Relative minority size (main effect rpb, b;) —105.90# —103.33* 
Income (bs) - 3.00* 
Race-income interaction (b7) - —2.28 
Individual-level intercept (bo) 88.02* 82.518 

Sample N 804,000° 804,000° 

Source: Summary File 3, Census 2000 
*p<0.001 


‘Weighting of cases is described in the text 

‘In individual-level, contextual regression for D and S the specification is y=b,+ b,(race)+b,(Inp 
op) +b;(rpb) + b,(race - Inpop) + bs(race - rpb) + bs(income)+bAincome:-race). Implied  city-level 
effects are b;, b4, and bs. To make intercepts substantively meaningful, Inpop and rbp are centered 
on values near the low end of the observed sample distribution; specificall, Inpop is centered on 
12.0 and rbp is centered on 0.1 


White-Black differences in residential contact with Whites (coded to reflect how 
contact determines values of D and S) decrease only modestly when White-Black 
differences in socioeconomic characteristics are taken into account at the individual- 
level; that is, by drawing on micro-level data to standardize the White-Black 
comparison to take account of the impact of group differences in income separately 
in each city based on the city-specific race-income-residence relationship at the indi- 
vidual level for that city. 

The third column of Table 10.3 presents city-level regressions that replicate 
another finding reported in Fossett (1988). The dependent variables for these analy- 
ses, D* and S*, are values of D and S that have been “standardized” so they repre- 
sent differences in residential outcomes between Whites and Blacks with identical 
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Table 10.3 Analyses illustrating how city-level analyses of segregation can yield misleading 
results when aggregate measures are used to control for group differences on non-racial 
characteristics 


City-level regressions for White-Black segregation — Observed Dissimilarity (D) and 
Standardized Dissimilarity (D*) 


D D D* 

Unstandardized regression coefficients 

City size (Inpop) 3.90* 4.23" 4.438 

Relative minority size (rpb) 4.79 —10.36° 6.79 

Ratio of mean incomes (B/W) - —40.29* —29.43° 
Standardized regression coefficients 

City size (Inpop) 0.3998 0.423" 0.4278 

Relative minority size (rpb) 0.063 —0.136° —0.086 

Ratio of mean incomes (B/W) - —0.538* —0.379" 
Sample N 201 201 201 


City-level regressions for White-Black segregation — Observed Separation Index (S) and 
Standardized Separation Index (S*) 


S S S* 
Unstandardized regression coefficients 
City size (Inpop) 5.83* 6.32* 6.24" 
Relative minority size (rpb) 70.51% 48.04* 52.01 
Ratio of mean incomes (B/W) - —59.75° —45.37° 
Standardized regression coefficients 
City size (Inpop) 0.360" 0.391 0.406" 
Relative minority size (rpb) 0.571 0.389" 0.444" 
Ratio of mean incomes (B/W) — —0.494* —0.396° 
Sample N 201 201 201 
Source: Summary File 3, Census 2000 
“p<0.001 
*p<0.01 
“p<0.05 


income distributions. Drawing on techniques discussed earlier in Chapter 9, the 
standardization is accomplished by calculating D* and S* from predicted means on 
segregation-relevant residential outcomes for Whites and Blacks with identical lev- 
els of income based on city- and group-specific individual-level regressions of resi- 
dential outcomes on income. Since D* and S* reflect White-Black differences in 
residential outcomes for families that have identical levels of income, city variation 
in D* and S* cannot be attributed to city variation in group income differences. 
Nevertheless, the city-level measure of racial income inequality continues to have 
very strong and statistically significant effects on D* and S* in the city-level 
regressions. 

There is a ready explanation for this result. It is that the aggregate-level associa- 
tion of segregation and socioeconomic differences reflects much more than the nar- 
row impact of racial income differences on racial differences in residential outcomes. 
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Drawing on arguments set forth in more detail in Fossett (1988) I suggest that the 
strong effect of racial income inequality in this equation is misleading and primarily 
reflects a spurious association. My interpretation is guided by the simple hypothesis 
that aggregate racial inequality in all important areas of socioeconomic attainment 
are likely to co-vary because they all are likely to share a common cause; they vary 
together based on the general salience of race and minority disadvantage in socio- 
economic attainment dynamics in the community stratification system. To the extent 
that this is so, racial segregation and racial income inequality will be correlated at 
the aggregate level even when White-Black income differences play a minor role in 
shaping White-Black residential segregation. The regression results reported in col- 
umn 3 are consistent with this hypothesis. This in turn indicates that the strong 
effects of racial income inequality in the regression results reported in column 2 are 
misleading. 

The interpretation I offer regarding the effect of income inequality in aggregate- 
level regressions predicting segregation is at odds with the usual interpretation 
offered aggregate-levels studies of segregation. But it is consistent with findings 
from micro-level studies of the role of group income differences in shaping White- 
Black segregation. Studies that draw on micro-data that disaggregate residential 
outcomes by income and race simultaneously consistently report that White-Black 
income differences are not a major factor contributing to segregation between the 
groups. For example, analyses performed for individual cities typically report that 
index scores for White-Black segregation are as high when computed for house- 
holds that are matched on income (or other socioeconomic characteristics) as when 
computed for the full populations (Farley 1977; Denton and Massey 1988; Massey 
and Fischer 1999). I observe the same pattern in the city-specific income 
standardization exercises that generated the D* and S* index scores used in the 
aggregate analyses reported here. 

In sum, then, there is little available evidence from analysis of detailed micro- 
data for individual cities to indicate that White-Black income differences play a 
major role in producing residential separation of Whites and Blacks. The reason is 
simple; Whites at all income levels tend to live apart from Blacks at all income lev- 
els. Findings of this sort based on analysis of disaggregated micro-data should be 
seen as more compelling than findings from aggregate correlations of racial income 
differences and racial segregation. Researchers seeking to properly assess the 
impact of group differences in non-racial characteristics (e.g., income) on segrega- 
tion must directly examine how residential attainments vary with those characteris- 
tics separately by race in each community using disaggregated micro-level data. 
The framework for studying segregation set forth here allows researchers to investi- 
gate these questions in a methodologically sound way. It allows them to assess the 
role of group differences on non-racial characteristics such as income using 
individual-level contextual models of attainment. The alternative approach of 
including measures of socioeconomic inequality in city-level analyses of segrega- 
tion is flawed and prone to yielding misleading results as seen in the example here. 
It should be abandoned. 
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Chapter 11 
Aspatial and Spatial Applications of Indices 
of Uneven Distribution 


The difference of means formulations of indices of uneven distribution makes it 
relatively straightforward to implement segregation measurement in either conven- 
tional aspatial formulations or in spatial formulations. Aspatial versions of segrega- 
tion indices are familiar because they are widely used in empirical studies. They are 
obtained by applying any of the computing formulas reviewed here using data for 
non-overlapping, bounded areas such as census tracts, block groups, or blocks. It is 
appropriate to designate the resulting scores as “aspatial” because the spatial 
arrangements of the units (e.g., blocks, block groups, tracts) have no implications 
for the scores obtained. Spatial formulations would differ on this key point; namely, 
the spatial arrangement of units can potentially impact index scores. 

In truth, opportunities to compute spatial versions of indices of uneven distribu- 
tion have always existed. But apparently this has not been widely appreciated. Or, 
more carefully, researchers have rarely taken advantage of this possible option. One 
simple way to implement popular indices of uneven distribution in either aspatial or 
spatial versions is to use computing formulas that give index scores as population 
averages for area-specific residential outcomes. Figures in Appendices present for- 
mulations of this type for all popular indices of uneven distribution. Here I note only 
two such formulations, one for D and one for S. Both take the general form 
100-(1 / T) -Ly where y is a residential outcome for individuals scored on the basis 
of their area of residence. The value of D can be obtained using y=| p, —P|/2PQ 
and the value of S can be obtained using y, =(p, — Pp)? /PQ. 

If y and p are calculated using only the data for the block the individual resides 
in, the calculations will yield the usual index score which is aspatial because how 
individual blocks are arranged in space has no impact on index scores. However, if 
one chooses to do so, one can calculate y and p based on spatially defined neighbor- 
hoods. For example, one could define the neighborhood as a “first order” contiguity 
neighborhood based on combining data for the block the individual resides in and 
also the blocks that are adjacent to that block. This is the only modification that is 
required; all other steps in the calculations remain the same. 
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This neighborhood formulation makes the index “spatial” because how blocks 
are arranged in space will now potentially affect index scores. The key change is 
that an individual’s neighborhood has shifted from being equated with a discrete 
“bounded” area that applies only to individuals in the area to a spatially-defined 
region that in some degree is shared with individuals in adjacent areas. I ignore the 
fact that the size of the neighborhood has changed because it is not a fundamental 
issue. It can be rendered irrelevant by defining bounded areas and spatially defined 
areas to be comparable in size. 

Following this example, it is obvious that difference of means formulations of 
indices also can be implemented as either spatial or aspatial. The key terms that 
determine the index scores are individual residential outcomes (y) that are scored 
from area group proportion (p). Calculate p for bounded areas and the index is aspa- 
tial; calculate p for spatially defined areas and the index is spatial. Assessment of 
group means and associated segregation index scores is easy to accomplish either 
way and results will be spatial or aspatial depending on this choice of how area 
group proportions are calculated. I have drawn on these options when conducting 
simulation studies of segregation dynamics using the SimSeg simulation model 
(Fossett and Waren 2005; Fossett and Dietrich 2008; Clark and Fossett 2008; Fossett 
201 1a) and also in applications using block-level census data to assess segregation 
using neighborhoods that vary in spatial scale (Fossett 201 1b). 

Spatial and aspatial implementations of indices are both potentially interesting. 
However, my own experience in empirical analyses has been that they rarely yield 
different substantive findings when they are implemented at spatial scales that yield 
comparable neighborhood-level population counts. But it is logically possible that 
they might yield different findings in some circumstances. For example, one can 
imagine that some administrative boundaries (e.g., school district lines, city bound- 
aries, zoning areas, etc.) and/or urban ecological barriers (e.g., highways, roads 
patterns, rivers, etc.) could delimit sociologically meaningful spatial domains that 
are sharply “bounded” based on the impact of physical barriers or administrative 
boundaries on social interaction. In the extreme case, racial composition in adjacent 
areas would not matter because social interaction and common residential fate are 
determined solely inside the boundaries of the spatial units used. In practice, how- 
ever, boundaries for the spatial units used most often in segregation research can be 
somewhat arbitrary and spatially defined areas may potentially correspond more 
closely to sociologically meaningful neighborhoods. For example, a block located 
near the boundary of census tract may have more in common with the nearby blocks 
in an adjacent tract than with blocks on the far side of the same tract. So both 
approaches can be defended on conceptual grounds. 

Again, there is as yet little evidence to indicate that the choice between spatial 
and aspatial implementations of segregation indices carries compelling practical 
consequences for findings regarding aggregate segregation patterns. However, I dis- 
cuss the issue here because I can think of at least one practical reason for investiga- 
tors to consider using spatially defined neighborhoods. It is for studying segregation 
involving smaller groups and segregation in smaller communities (e.g., small cities 
and CBSAs). I noted earlier that in conducting analyses of segregation in CBSAs I 
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have found that census tracts and even census block groups can be too large to cap- 
ture segregation patterns in smaller CBSAs. In particular, I find that tracts and block 
groups are not well suited for studying segregation involving smaller populations — 
for example, studying segregation for Latinos in areas of recent settlement. Among 
available census geography that leaves census blocks as the best option to use for 
computing standard aspatial segregation indices. However, some researchers might 
worry that census blocks are too small. One way to address this concern is to assess 
segregation using first- or second-order spatial neighborhoods based on block data. 
These would meet the needs of using spatial units that are small enough to capture 
segregation in smaller communities and for smaller groups while at the same time 
being potentially more appealing with regard to reflecting sociologically meaning- 
ful neighborhoods. 
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Chapter 12 
Relevance of Individual-Level Residential 
Outcomes for Describing Segregation 


The new options for segregation analysis introduced here suggest a new basis for 
evaluating segregation indices — the substantive relevance of the individual-level 
residential outcomes registered by the indices. Three of the segregation indices con- 
sidered here — G, D, and S — have been used widely in empirical analyses for more 
than five decades and each has been reviewed many times in methodological studies. ! 
Until now little attention has been given to the substantive relevance of the individ- 
ual-level residential outcomes each index registers. In this chapter I argue that it is 
instructive to consider how indices differ on this important point of comparison. 

In their difference of means formulations G, D, and S register group differences 
on average residential outcomes (y) scored from (pairwise) area group proportion 
(p). G rescales p to register relative rank or percentile scoring. D rescales rank dis- 
tinctions on p to register only a 0, 1 coding of whether or not p is above the city 
average (P). S does not rescale p; it registers it in its original metric. Because S 
registers p directly, a given value of p yields the same value of y in all cities. In 
contrast, G and D assign values of y based on monotonic, rank position scoring 
schemes that vary in functional form in complex ways across cities. In particular, 
the scoring of y from p is nonlinear and the magnitude of the departure from linear- 
ity varies with city racial mix (as discussed previously in Chap. 5). Consequently, 
identical values of area racial proportion (p) can be and often are assigned very dif- 
ferent values on the residential outcome of scaled contact (y) in different cities. 

Residential outcomes (y) registered by S — for example, area proportion White 
(p) in the familiar case of White-Black segregation — have clear substantive appeal. 
The residential outcome of group contact in its “natural” metric is directly meaning- 
ful to individuals and households both in its own right and also because area propor- 
tion White (p) tends to correlate with neighborhood characteristics that have clear 
relevance for life chances (e.g., crime rates, quality of schools, neighborhood ser- 
vices and amenities, property values, etc.). The same cannot be said for the 


'T bring R and H into the discussion later in this chapter. 
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neighborhood outcomes (y) used in computing scores for G and D. G rescales val- 
ues of p into ordinal-level, relative rank (percentile) scores. D collapses values of p 
to just two relative rank scores. 

The value and sociological relevance of scoring residential outcomes (y) as G 
and D do is not obvious. Few if any discussions of group differences in residential 
outcomes explicitly prioritize ordinal position on contact with Whites over contact 
with Whites in its natural metric. Similarly, discussions that view area proportion 
Whites as relevant for the impact of area of residence on life chances rarely if ever 
suggest that this is best captured by coding area proportion White in terms of rela- 
tive rank position or in terms of “parity.” To the contrary, theories of majority group 
discrimination and avoidance of minority groups usually presume that exclusionary 
discrimination by Whites and White avoidance of minority areas is aimed at main- 
taining neighborhoods as predominantly-majority (e.g., 85% White or higher) 
rather than simply “above parity” in comparison to proportion White in the city 
which of course can vary dramatically across cities. In view of this, I believe there 
is no compelling basis for giving “relative rank” scoring of p or “above parity” scor- 
ing of p priority over the natural interval metric for p. 

S also is attractive because it has clear, straightforward implications that are easy 
to explain to general audiences. For example, if White-Black segregation as mea- 
sured by S is high — say 60 — it means Blacks’ average contact with Whites is 60 
points lower than Whites’ average contact with Whites. This yields an unambiguous 
signal about the consequences of segregation for individuals and groups; it indicates 
that, on average, Whites live in predominantly White neighborhoods and Blacks 
live in predominantly Black neighborhoods. This score on S also sends a signal 
about the extent to which Whites and Blacks can potentially experience differences 
on life chances based on neighborhood characteristics that correlate with area pro- 
portion White. When S is zero, Whites and Blacks will necessarily experience the 
same average on all residential outcomes. As S increases above zero, so too does the 
potential for Whites and Blacks to experience differences on other important resi- 
dential outcomes (e.g., crime, poverty, schools, amenities, etc.). 

The simple and clear conclusions one can draw based on knowing that S takes a 
high score do not necessarily hold when G and D take high scores. To the contrary, 
as discussed in Chaps. 7 and 8, it is possible for G and D to be very high — say 80 — 
and for both Whites and Blacks to live in neighborhoods that on average are similar 
on area proportion White (p). In these cases it could be highly misleading to assume 
that high scores on G and D carry consequences for group differences on neighbor- 
hood outcomes that are relevant for life chances (e.g., crime, poverty, schools, ame- 
nities, etc.) and are correlated with area proportion White. The reason for this is 
simple. If Whites and Blacks experience similar outcomes on area proportion White, 
they will, all else equal, tend to experience similar outcomes on factors that are cor- 
related with area proportion White. 

The mathematical basis for how G and D can take high values when Whites and 
Blacks share similar neighborhood outcomes was discussed earlier in Chap. 5. It 
was illustrated in the graphs in Fig. 5.1 which depict how G and D register group 
differences in contact with Whites (p) after p has been subjected to a dramatic 
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nonlinear rescaling. This nonlinear rescaling of p reduces the importance of group 
differences on contact with Whites (p) over some ranges of p and it exaggerates the 
importance of group differences on p over other ranges of p. In the graph for the 
White-Asian comparison in Fig. 5.1, for example, group differences in p over the 
range of 0-80 are of minor importance while group differences on p over the range 
80-100 take on great importance. D is even more extreme in this regard; it rescales 
values of p into values of y based on a one-step function that registers only differ- 
ences on either side of P. The discussion in Chap. 7 further outlined the technical 
basis for how these characteristics of G and D create the possibility that they can 
and often will take values that differ substantially from values of S. 

The key implication is that high values on G and D have uncertain implications 
for group differences on sociologically meaningful residential outcomes because 
values of D and G can be highly sensitive to small differences in area group propor- 
tion. Specifically, G and D for White-Black segregation can in principle take very 
high values when Whites and Blacks live in neighborhoods that, on average, are 
fundamentally similar on area proportion White (p) and other sociologically impor- 
tant neighborhood outcomes. 

This may be surprising to some. If so, it is instructive to carefully consider the 
familiar interpretation of D as indicating the minimum proportion of one group that 
must move to bring about even distribution. Note that this interpretation implies 
nothing specific about whether the residential movement that eliminates uneven 
distribution as measured by D will cause either group’s residential outcomes to 
change in sociologically important ways. In fact, the movement associated with 
eliminating a high value for D can and sometimes will produce small, potentially 
trivial, average changes in substantively meaningful residential outcomes for the 
members of a group. 

This frames a point of clear contrast between G and D on the one hand and S on 
the other. High values of S always signal that residential movement needed to bring 
about even distribution will produce dramatic changes in group differences in resi- 
dential outcomes. This is not necessarily true for G and D. This is a consequence of 
the fact that high values of G and D can occur under “prototypical segregation,” 
which involves high levels of group separation, but also under “dispersed displace- 
ment” or “displacement without separation” as discussed in Chap. 7 (Fig. 7.1). 

The potential for G and D to manifest this characteristic is not uniform across all 
circumstances. It varies dramatically with relative group size. The underlying tech- 
nical basis for this was reviewed in Chap. 7 and the logically possible consequences 
for D-S differences were summarized graphically in Figs. 7.4 and 7.5. The implica- 
tions for empirical analyses also were illustrated in the comparison of the function 
y=f (p) for G/2 in the graphs in Fig. 5.1. The graph for the White-Latino compari- 
son has the mildest nonlinearity in the scoring of y from p because it has the most 
balanced group ratio of 68/32. In contrast, the group ratio of 92/8 for the White- 
Asian comparison is much more imbalanced and the nonlinearity is much more 
pronounced in the graph for this comparison. The White-Black comparison is in 
between on both the group ratio of 76/24 and the nonlinearity of the y—p 
relationship. 
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Fig. 12.1 Response of group contact (y) scored for Hutchens R by proportion White in area (p) 
and selected values for city proportion White (P). Curves reflect the response of Hutchens’ R to a 
change in area proportion White (p) by level of p and selected values of proportion White for the 


city (P). y=f(p)=Q +(I -ypq/PQ \/(p/P = 4/0). Moving from darker curves to lighter curves, 


the values of P are: 0.01, 0.05, 0.20, 0.50, 0.80, 0.95, and 0.99. The horizontal line is for reference 
and reflects the “flat” response of the separation index (S) 


Further insight into these patterns can be gained by again considering the behav- 
ior of the function y=f (p) for the Hutchens square root index (R) shown earlier in 
Fig. 6.6. The y—p relationship for R is continuous and thus lends itself more easily 
to mathematical and graphical analysis than the y—p relationship for G which is 
mathematically less tractable because it is based on a percentile transformation. In 
other key respects, however, R and G are quite similar: the y—p relationships for 
both R and G have similar nonlinear forms (i.e., both follow a backwards S curve); 
the nonlinearity in the y—p relationships for both R and G become more pronounced 
when group size is more imbalanced; and R and G are closely correlated in empiri- 
cal data sets. 

The graph in Fig. 6.6 plots the function y=f (p) for R over selected values of 
city racial mix (P). The graph documents that y=f (p) for R is always a continu- 
ously rising backwards S curve. The nonlinear nature of the y—p relationship means 
that R responds to differences on p in a much different way than S. S registers dif- 
ferences arithmetically according to p’s original, “natural” metric. R responds more 
strongly to differences on p over ranges of p where the curve is relatively “steep” 
and less strongly to differences on p over ranges of p where the curve is relatively 
“flat”. The graph also reveals that where steep and flat regions of the curve occur 
over the range of p is strongly conditioned by the racial mix of the city (P). When 
the two groups are balanced, the y—p curve for R is symmetric and differences 
between how R and S respond to p are modest. When group size is imbalanced, the 
y—p curve for R becomes asymmetric and more profoundly nonlinear. Under these 
conditions, the differences between how R and S respond to p can be dramatic. 

This is documented further in Fig. 12.1 which depicts graphically how changes 
in p are registered as changes in y as scored for R. The graph on the left uses the 
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original metric scoring of y; the graph on the right uses a natural log scale on the y 
axis. These two graphs make it clear that R responds more strongly to changes in p 
near the extremes of p and this tendency becomes more asymmetric and more dra- 
matic when the racial composition of the city departs from balance (i.e., 50/50). 
This establishes the mathematical basis for how and when R (and G and D) can take 
high values when S is low. Regarding the “how” part of the story, R (and G and D) 
can take high values when S is low by responding dramatically to very small differ- 
ences on p. Regarding the “when” part of the story, the potential for R to depart from 
S is greatest when the city racial mix (P) is highly imbalanced. 

It is clear from these results that R must be high when S is high, but R can be 
either high or low when S is low. As noted earlier, this also applies with equal force 
to G and D. Thus, if S is high, G and D must be high, but when S is low D and G can 
be either high or low. This is consistent with results presented earlier in Figs. 8.1 and 
8.2 which depicted graphs of plotting scores for D against scores for S (and vice 
versa) for White-Minority segregation comparisons for CBSAs in 1990, 2000, and 
2010. It is readily evident here that when S is high, D also is high. But when S is 
low, values of D vary dramatically; sometimes they are low and sometimes they are 
high. This raises an obvious question, “When S is low and D (or G or R) is high, is 
there a compelling reason for assigning sociological importance to the high values 
of D (or G or R)?” I am not aware of a reason that is (or could be) grounded in the 
consequences segregation will have for sociologically important group differences 
in residential outcomes. 

The one reason that comes to mind is grounded, not in consequences for group 
differences in residential outcomes, but more literally in “volume of movement” 
consequences of policies seeking to redress segregation. High values of D do imply 
that a large fraction of one group must change area of residence to bring about even 
distribution. That can be sociologically consequential in policy situations such as 
school desegregation where students are literally redistributed across schools. 
Historically, the consequence has been especially important for minority popula- 
tions who have often disproportionately born the burden of bussing. 

The sociological relevance of this volume of movement policy consequence can- 
not be denied. But its relevance for choosing segregation indices can be discounted 
for two reasons. The first is that it is “beside the point” because historically literal 
“volume of movement” policy implications of high values of D have almost always 
played out in contexts of “prototypical segregation” where values of S also are high. 
The driving concern behind the policy to redress segregation of course was that 
racial segregation adversely impacted life chances in education by creating group 
separation and unequal educational opportunities. The sociological import of D is 
fundamental and real; but it is beside the point for the issue under discussion because 
S captures the same concern and thus D does not identify a “life chances” implica- 
tion that S misses. 

The second reason is that the policy implications of a high value of D are much 
less likely to have practical consequences in situations where D is high and S is low. 
The basis for saying this is that policy concerns about reducing segregation usually 
are rooted in concerns about the impact of segregation on inequality in life chances. 
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When D is high and S is low, groups live together and experience similar neighbor- 
hood outcomes. In these situations moving across neighborhoods to achieve exact 
even distribution will have limited impact on group differences in neighborhood 
outcomes. Thus, since policies to promote integration are unlikely to be pursued 
solely for the purpose of achieving exact even distribution without implications for 
life chances, the policy implications of D’s volume of movement interpretation are 
unlikely to come into play in practice. 

So we come back to the issue of why one would focus on values of D, or its 
technically superior “close cousins” G and R, over values of S. To argue that high 
values of R, G, and D are sociologically important when S is low, one must advocate 
two unusual views about the sociological relevance of residential outcomes. 


First, one must view differences on p as both very important over certain narrow 
ranges of p and also much less important over the rest of the logical range of p. 


Second, one also must view it as desirable to amplify this differential evaluation of 
differences on p by greater amounts when a city’s racial mix is imbalanced. 


To the best of my knowledge, no segregation researcher has articulated a compel- 
ling basis for assessing group differences in residential outcomes in this manner. 
Measurement approaches of this sort are not used when group differences on other 
socioeconomic outcomes such as education, occupation, and income are studied. So 
it is not obvious why such an approach would be seen as attractive when studying 
group differences in residential attainments relating to area racial mix and group 
contact. 

To be clear, I am not arguing that G, D, and R should not be used to measure 
uneven distribution. Researchers can be interested in uneven distribution for many 
different reasons. In some cases they may determine that one of these indices is the 
best choice to serve the needs of a particular study. As just noted, these measures 
might be defensible choices if one is interested in certain consequences of segrega- 
tion in relation to a social policy such as bringing about school integration where D 
could be seen as superior to S in signaling how much potential “social disruption” 
will be involved in achieving segregation. This would be sociologically important 
regardless of whether movement to achieve integration brings about big changes in 
racial proportions in different schools. 

At the same time, I argue against the prevailing view that G, D, and R should be 
seen as the best available choices or even appropriate choices for serving most 
research interests. Personally, I am interested in measures of uneven distribution 
that are well suited for signaling the consequences segregation may have for group 
differences in residential outcomes that are both meaningful to individuals and 
households and relevant for life chances associated with residential outcomes. 
Given this focus, I am drawn to S because, among popular indices of uneven distri- 
bution, it registers residential outcomes that have clear and compelling implications 
for racial differences in residential outcomes. Focusing on the example of White- 
Black segregation, I know that when S is high, Whites and Blacks are residentially 
separated and are living apart from each other in neighborhoods that differ markedly 
on racial mix. I also know that there is a clear structural potential for the 
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neighborhoods that Whites and Blacks live in to differ in other respects as well (e.g., 
amenities, crime, poverty, exposure to social problems, etc.). Furthermore, I know 
that when S is high, R, G, and D also will be high and as a result knowledge of their 
values adds limited additional information that is relevant to my concerns. 

When S is low, I know that Whites and Blacks are not residentially separated; 
instead, they are living together in the same neighborhoods. Because of this, I addi- 
tionally, know that, all else equal, the possibility for Whites and Blacks to experi- 
ence fundamentally different neighborhood outcomes on other dimensions (e.g., 
amenities, crime, poverty, etc.) is logically constrained because people who reside 
in the same neighborhoods necessarily experience the same neighborhood out- 
comes. If S is exactly zero, Whites and Blacks cannot on average experience differ- 
ent neighborhood outcomes based on race alone. As S takes higher values, the 
logical potential increases for Whites and Blacks to differ on residential outcomes 
based on race alone. 

Of course S does not reflect all relevant aspects of race differences in residential 
outcomes by itself. Other characteristics such as income can interact with race and 
influence race differences in neighborhood outcomes. For example, a low-to- 
moderate level of S, say 15-20, could result because Whites and Blacks have sub- 
stantial contact across all income strata. Alternatively, the same level of S may result 
due to Blacks having higher levels of contact with low income Whites that offset 
Blacks having lower levels of contact with high income Whites. All else equal, the 
second scenario will be associated with greater White-Black differences in exposure 
to poverty and low income. This does not change the fundamental implications of 
high S versus low S situations. It merely acknowledges that consequences of racial 
differences in residential distributions are not necessarily simple. 

What can be said about White-Black neighborhood differences when R, G, and 
D take high values? This is much harder to pin down. When S is high, R, G, and D 
will be high. But the reverse is not true. S can be low when R, G, and D are high, 
particularly when group size is highly imbalanced. When this occurs, the high val- 
ues of R, G, and D do not provide a basis on their own for offering conclusions 
regarding White-Black differences in residential outcomes. This monograph has 
established that, as a matter of arithmetic, when R, G, and D have high values when 
S is low it is because these indices are responding strongly to small quantitative dif- 
ferences on neighborhood racial mix (p) over relatively narrow ranges of p. This 
provides little basis for speculating about the consequences of uneven distribution 
for residential differences. This is made worse by the fact the “crucial” range of p 
varies from city-to-city depending on racial mix. For my research interests, this 
index behavior is not attractive. 

What about Theil’s H which I have not yet discussed? Like S, Theil’s H usually 
receives favorable treatment in methodological studies of segregation indices but has 
been used less frequently than D in empirical studies. For purposes of this discussion, 
H falls between S and indices rooted in the segregation curve (G, D, and R). Figures 
6.5 and 6.6 introduced earlier show that the function y=f (p) for H is similar to the 
same function for R in several respects. The nonlinearity in the y—p relationship is 
similar in form, but the magnitude of the departure from nonlinearity is much less 
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Fig. 12.2 Response of group contact (y) scored for Theil’s H by proportion White in area (p) and 
selected values for city proportion White (P) (Curves reflect the response of Theil’s H to a change 
in area proportion White (p) by the level of p and selected values of proportion White for the city 
(P). y = f(p) = Q+[(E-e)/E]/(p/P—q/Q). Moving from darker curves to lighter curves, the values of 
P are: 0.01, 0.05, 0.20, 0.50, 0.80, 0.95, and 0.99. The horizontal line is for reference and reflects 
the “flat” response of the separation index (S)) 


dramatic and the degree to which it varies with city racial mix (P) also is less dra- 
matic. So, in comparison with S, H has similar tendencies as R, but in milder degree. 
Figures 12.1 and 12.2 document that H has similar tendencies to R in terms of how 
changes in area racial composition (p) translate into changes in residential outcomes 
(y). The figures document similarity in the form of the response. One must note the 
values on the “Y” scale in the figures to see that the responses by H are milder than 
the responses by R. 

What distinguishes H from R is this. H is rooted in a conception of uneven dis- 
tribution that draws on the information-theoretic notion of relative deviation from 
expected entropy. Individuals who find this conceptual approach attractive may 
accordingly prefer H. But like G, D, and R, H is differentially sensitive to changes 
in p over relatively narrow ranges and the relevant ranges vary with city racial mix. 
I am not aware of a basis for prizing this quality and leave it for others to make the 
case. 


12.1 An Example Analysis of Segregation and Exposure 
to Neighborhood Poverty 


I conclude this chapter by presenting an empirical analysis intended to speak to the 
issues reviewed here in a more “concrete” way. The issue I explore is whether high 
scores for measures of uneven distribution carry implications for racial stratification 
on residential and neighborhood outcomes. To investigate this, I used block group 
data from Summary File 3 of the 2000 census and computed scores for the indices 
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of uneven distribution discussed in this section — specifically, G, D, R, H, and S — for 
Core Based Statistical Areas (CBSAs). I computed scores for White-Minority com- 
parisons — specifically, White-Black, White-Latino, and White-Asian — using data 
for non-Hispanics for Whites, Blacks, and Asians. For economy of presentation, I 
focus on the results for D and S, noting that index scores for G and R correlate 
closely with scores for D and noting that scores for H takes an intermediate position 
between scores for D and S. 

Į additionally calculated group-specific exposure to neighborhood poverty based 
on poverty rates for neighborhoods (calculated using data for the total population) 
and also group-specific exposure to neighborhood income rank (percentile standing 
based on the city-specific income distribution for the total population). I then calcu- 
lated the White-Black, White-Latino, and White-Asian differences on exposure to 
neighborhood poverty and exposure to neighborhood income rank. The differences 
were constructed so positive scores indicated White advantage.’ I restricted the 
analysis to CBSAs where the minority group in the segregation comparison had a 
population of 1,500 or more and where the number of block-groups was adequate 
for assessing segregation patterns.’ This resulted in 1,455 CBSA-group compari- 
sons; 571 White-Black comparisons, 605 White-Latino comparisons, and 279 
White-Asian comparisons. 

I then addressed the following question; “Do scores on D and S for White- 
Minority segregation carry similar or different implications for White-Minority dif- 
ferences on these residential inequality outcomes?” To a certain extent they do carry 
similar implications, at least in this analysis, as the scores for both D and S are posi- 
tively associated with White-Minority inequality on exposure to poverty and neigh- 
borhood income rank. The White-Minority difference in exposure to neighborhood 
poverty (coded so higher scores indicate White advantage) is correlated with D at 
0.645 ( r?= 0.417 ) and with S at 0.715 ( r?= 0.512 ). The White-Minority difference 
in exposure to neighborhood income rank (also coded so higher scores indicate 
White advantage) is correlated with D at 0.619 (r?= 0.383 ) and with S at 0.702 
(1?= 0.494 ). These results indicate that S provides a better signal for when segrega- 
tion carries implications for racial inequality in neighborhood outcomes. But in this 
analysis D is not awful for this purpose. One reason for this is that scores on D and 
S are often concordant. The story changes substantially when attention is focused 
on cases where D and S are discordant. 

Probing more deeply into the data lends additional support to the idea that S is 
more attractive than D for the purpose of signaling when it is likely that segregation 
is associated with White-Minority inequality in residential outcomes. To do this, I 
coded each White-Minority segregation comparison on the consistency of D and 


? The poverty difference is minority exposure minus White exposure. Positive scores indicate that 
the minority group is exposed to higher levels of neighborhood poverty than White. The income 
rank difference is White exposure minus minority exposure with positive scores indicating Whites 
are exposed to neighborhoods that rank higher on income (White advantage). 

3 The cut-off was at least 10 populated block groups. I replicated the results using cut-off values of 
15 and 20 block groups. The results were the same. 
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S. Recall that D can take high values when S is low. Based on this, I classified out- 
comes on D and S into four categories. The first is a baseline category of “concor- 
dant” as occurs in prototypical segregation where displacement from even 
distribution is substantially polarized. The other three categories capture D exceed- 
ing S by increasingly large amounts. Holding D constant, distribution across the 
three categories of D-S discrepancy indicates variation in the extent to which dis- 
placement from uneven distribution is dispersed and produces lower levels of group 
separation and neighborhood polarization. 

I then estimated the regression of the White-Minority difference on exposure to 
neighborhood poverty on D and the three categories of D-S discrepancy. The mul- 
tiple R-square for the regression was 0.502 compared to 0.417 when using D alone. 
This indicates that knowing that D is discordant from S added to the ability to pre- 
dict the White-Minority difference in exposure to neighborhood poverty over what 
could be predicted from knowledge of D alone. As expected, the pattern of the 
effects indicated that when D was high in relation to S, the White-Minority differ- 
ence in exposure to neighborhood poverty was lower (all effects were statistically 
significant at p< 0.001). The impact of the largest D-S discrepancy category was 
—4.3 which is clearly large in relation to the value of 6.9 for interquartile range of 
6.9 for the dependent variable. 

I obtained similar results for the regression predicting the White-Minority differ- 
ence on neighborhood income rank. The multiple R-square for the regression using 
D and the three categories of D-S discrepancy as predictors was 0.483 compared to 
0.383 when using D alone. The results indicated that knowing that D was high in 
relation to S added to the ability to predict White-Minority difference in exposure to 
neighborhood income rank over what could be predicted from knowledge of D 
alone (all effects statistically significant at p< 0.001 ). As expected, discrepant cat- 
egories had lower levels of White-Black inequality on income rank and the impact 
of the largest D-S discrepancy category was —4.0 which is clearly large when com- 
pared to the value of the interquartile range of 5.8 for the dependent variable. 

I next estimated parallel regressions where S and categories of D-S discrepancy 
were used to predict White-Minority disadvantage in exposure to poverty and neigh- 
borhood income rank. The results were different and quite revealing. For the regres- 
sion of the White-Minority difference on exposure to neighborhood poverty the 
multiple R-square for the regression was 0.529 compared to 0.512 when using S 
alone. This signals that knowing D was high relative to S increased the ability to 
predict the White-Minority difference in exposure to neighborhood poverty by only 
a small amount over what could be predicted from knowledge of S alone. The coef- 
ficients for the three categories of discrepancy were all statistically significant (all at 
p< 0.001 ) but impacts were more modest than in the parallel analysis focusing on 
D as the largest effect here was 1.9 which was less than half the size of the largest 
effect of —4.3 seen in the parallel analysis focusing on D. 

I found similar results for the regression predicting the White-Minority difference 
on neighborhood income rank. The multiple R-square for the regression using S and 
the three categories of D-S discrepancy as predictors was 0.508 compared to 0.494 
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when using S alone. So, again, knowing that D was high relative to S increased abil- 
ity to predict White-Minority difference in exposure to neighborhood income rank 
by only a small amount over what could be achieved from knowledge of S alone. 
The effect coefficients for D-S discrepancy were statistically significant (all at 
p< 0.001 ), but effects were small compared to the parallel analysis focusing on D 
as the largest effect was 1.5 compared to —4.0 in the parallel analysis focusing on D. 

I draw the following conclusions based on these analyses. In comparison with 
the dissimilarity index (D), the separation index (S) speaks more directly to the 
question of whether uneven distribution is associated with group differences in resi- 
dential outcomes such as income and poverty. This is because S registers whether 
or not groups live separately in neighborhoods that are polarized on racial mix. This 
is a logical precondition for White-Minority differences on neighborhood-level 
stratification outcomes such as socioeconomic standing. D can take high values 
when groups live together in neighborhoods with similar racial composition and the 
logical potential for group differences in neighborhood outcomes is limited. 
Accordingly, S is the stronger predictor of White-Minority differences on 
neighborhood-based stratification outcomes such as indicators of neighborhood 
socioeconomic standing. Not surprisingly, I obtained parallel findings when con- 
trasting S with the gini index (G) and the Hutchens square root index (R). This is 
because these two measures correlate closely with D and can take high values when 
groups are not residentially separated. 

In view of these results, I suggest that researchers always examine multiple indi- 
ces and give particularly close attention to cases where S and D (or its close corre- 
lates) diverge. Such cases involve uneven distribution without residential separation 
and neighborhood polarization. These situations are likely to be fundamentally dif- 
ferent from cases of prototypical segregation where D and S both take high values. 
Specifically, group inequality in neighborhood-based residential outcomes is likely 
to be higher under a high level of prototypical segregation (i.e., a high-D, high-S 
combination) and lower under a high level of “displacement without separation” 
(i.e., a high-D, low-S combination). Personally, I am primarily interested in those 
aspects of segregation that have greater potential consequences for stratification in 
neighborhood outcomes and associated life chances. So I pay closer attention to S 
when S and D disagree. 
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Chapter 13 
Relevance of Individual-Level Residential 
Outcomes for Segregation Theory 


The residential outcomes that give rise to segregation index scores can be assessed 
in terms of whether they are relevant for investigating different theories of segrega- 
tion dynamics. In the final analysis, theories of segregation must reckon with the 
micro-level dynamics that produce the residential patterns that aggregate indices 
summarize. It is easy to see how the residential outcome registered by S — namely, 
area racial mix (p) — is relevant for theories of residential attainment dynamics. For 
example, Lieberson advanced the hypothesis that segregation arises in part when 
Whites strive to maintain high levels of same-group contact and avoid more than 
incidental levels of contact with minorities (Lieberson 1980, 1981: 75; Lieberson 
and Carter 1982). Combining this hypothesis with the assumption that Whites have 
greater ability to influence residential dynamics leads to straightforward predictions 
regarding how S will vary when city racial composition varies over time or across 
cities. For example, the hypothesis that discrimination by Whites serves to keep 
White contact with Whites from falling below fairly high levels (say 85 % or higher) 
leads to the prediction that S will vary as a positive, nonlinear function of proportion 
Black in the city.! 

The implications for D, G, R, and H are much more complicated and indirect. I 
not aware of any theories that suggest Whites may specifically strive to attain or 
avoid particular levels on the residential outcomes that determine the values of these 
indices. Figure 5.1 introduced earlier shows D, G, R, and H score values of p differ- 
ently across cities depending on the racial mix of the city. For present discussion, 
consider D when formulated as a difference of means when neighborhoods are 


'In a city where proportion Black is very low — say 1-5 %, S can be low since Blacks can experi- 
ence high levels of contact with Whites without causing problems for White’s desires to have 
limited contact with Blacks and high contact with Whites. This changes when proportion Black 
increases. In order to maintain White contact with Whites (Pww) at 0.90 or higher as proportion 
Black in the city increases, Black contact with Whites (Pgy) must fall. This will cause S to increase 
since, in the two-group case, S=Py,y, —P,y. The relationship will be nonlinear. Initially, S will 
increase rapidly as proportion Black in the city (Q) increases; then the rate of increase will decline. 
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scored as either 0 or 1 depending on whether p for the area exceeds P for city. In this 
formulation, a neighborhood where p is 90 would be scored 1 in a lower-P city such 
as Birmingham and 0 in a higher-P city such as Minneapolis. The literature on race 
and residential dynamics gives no basis for expecting residential outcomes to 
revolve around these 0-1 scores instead of the original values of p. In contrast, the 
literature does provide a basis for expecting p scored in its natural metric to predict 
residential dynamics; specifically, Lieberson hypothesizes that Whites in all cities 
will prefer residential outcomes of p=95 over p=85 and p=85 over p=75, and 
so on. Thus, one can plausibly argue that S registers White-Black differences on 
residential outcomes that are meaningful in residential attainment dynamics. I know 
of no basis for making this kind of argument for the residential outcomes registered 
by D, G, R, or H. 

With this in mind, it is interesting to note that the results presented earlier in 
Table 10.1 indicate that the impact of relative minority size on segregation is much 
greater in the analysis of S than in the analyses for D. For example, cities that are at 
4% and 25 % Black are predicted to differ by 24.0 points on S but only 1.6 points 
on D.? Furthermore, the effects of relative minority size on patterns of residential 
contact relating to S are more sensible in my view. For S, both White and Black 
contact with Whites declines as relative minority size increases, but the rate of 
decline is greater for Blacks thus leading to higher levels of group separation as 
minority size increases. This pattern is consistent with the Lieberson hypothesis. 
Contact with Whites as registered by D increases for both Whites and Blacks as 
relative minority size increases. These effects do not lend themselves to ready sub- 
stantive interpretation and in any event the pattern has minimal implications for 
city-level variation in segregation across cities. 

I conclude this discussion by noting again that it is unproductive to claim that any 
one segregation index is best for all circumstances and purposes. Accordingly, I 
advocate the following position. Ideally, researchers should be able to offer a sound 
justification for why a particular index is an appropriate choice for the substantive 
question(s) they are investigating. My comments endorsing S are rooted in a par- 
ticular set of research interests. I am interested in segregation as it relates to racial 
stratification and socioeconomic inequality and thus I assign priority to the implica- 
tions segregation may have for group differences in life chances linked with resi- 
dential outcomes. From this vantage point, I believe S registers outcomes that are 
meaningful to individuals and households and relevant to residential attainment 
dynamics that produce aggregate segregation. But J do not argue that this is the only 
valid vantage point from which to advocate the use of a particular segregation 
index. Others may offer good justifications for viewing other indices as valid and 
attractive either for addressing specific research questions that interest them or on 
various practical grounds. For example, while I have expressed reservations about 
G and D based on the unusual way they register group differences in area racial mix, 
I expect many researchers will continue to use them, especially D, in order to main- 


? These values on relative minority size translate to 0.14 and 0.50 on the square root of minority 
proportion. The difference of 0.34 is multiplied by the effect coefficients of 4.79 and 70.51 in the 
equations for D and S, respectively. This translates into 1.63 = 0.34(4.79) and 23.97 = 0.34(70.51). 
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tain continuity with previous research and because they find D’s aggregate-level 
“volume of movement” interpretation to be attractive. 

I conclude with a practical observation. It is that sometimes index choice is not 
that important and it is easy enough to check to determine whether this is the case. 
Recall that the analyses reviewed in Chap. 6 provided evidence that popular indices 
of uneven distribution correlated at very high levels (r? 2 0.85 ) when group size 
was not highly imbalanced (e.g., when 0.10 < P < 0.90 ). It is easy to see if this 
welcome situation prevails; examine the correlation of scores for D and S and check 
to see if results differ using these two indices. When these two measures correlate 
closely, all popular measures correlate closely. Accordingly, if the correlation is 
high and the key findings do not differ by index, it is safe to conclude that index 
choice is not an important factor in this situation. 

When D and S are not highly correlated one must give the issue of index choice 
more attention. To the extent possible one should provide a sound justification for 
choosing to use a particular index. Additionally, it would be wise to check for and 
acknowledge whether key empirical findings and substantive conclusions vary 
depending on index choice. 
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Chapter 14 
Index Bias and Current Practices 


Standard versions of indices of uneven distribution take their minimum value of 
zero only under the condition of exact even distribution. Most segregation research- 
ers and consumers of segregation studies are habituated to accepting this benchmark 
for social integration. On reflection, however, it is an unusual point of reference for 
assessing segregation. For one thing, exact even distribution usually is not logically 
possible because individuals, families, and households cannot be distributed in frac- 
tional parts as almost always is needed to achieve exact even distribution. The 
resulting departure from uneven distribution is likely to be negligible when segrega- 
tion is being assessed for broad group comparison using relatively large spatial units 
such as census tracts. But it will be non-negligible when measuring segregation for 
small groups and/or when using small spatial units such as blocks. 

A second reason for viewing exact even distribution as an unusual reference 
point is that it does not correspond to the notion that race (or more generally “group 
membership”) is statistically unrelated to neighborhood of residence in keeping 
with the usual “baseline” null hypothesis adopted in studies seeking to assess quan- 
titative group disparities on socioeconomic outcomes. To the contrary, exact even 
distribution is an unexpected outcome under a model of random distribution wherein 
race and neighborhood are statistically independent. Thus the occurrence of exact 
even distribution can signal that race is systematically associated with residence 
through some kind of structured social dynamic (e.g., a group quota allocation 
process). 

As a consequence of these two factors, scores for all popular indices of uneven 
distribution are inherently subject to upward bias in the following sense; they have 
positive expected values when residential distributions of individuals and house- 
holds are random and thus standard indices will signal that segregation exists even 
when there is no significant statistical association between group membership (e.g., 
race) and residential location. 

Index bias is a concern for several reasons. One is that, while bias is sometimes 
negligible and can safely be ignored; bias can be and often is non-negligible. When 
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this is the case, bias can distort index scores and result in misleading assessments of 
the level of segregation in a particular case as well as misleading assessments of 
how the case in question compares with other cases including the same city at 
another point in time. A second reason for concern about bias is that it varies in 
complex ways that can make it difficult for researchers to diagnose its presence and 
deal with its undesirable consequences. A third reason for concern is that, because 
researchers are aware that bias can render index scores untrustworthy, they guard 
against it by foregoing many kinds of segregation studies that they would otherwise 
undertake if index scores could be trusted. 

The current state of affairs presents difficult challenges to researchers. They want 
to view index bias as negligible for all cases in a given study so they can set aside 
concerns that assessments of segregation are untrustworthy when examining values 
of individual cases at a point in time, or when comparing values for a case over time, 
or when comparing values across cases. Unfortunately, it is not always safe to 
assume that scores can be trusted. In response to this situation, researchers routinely 
adopt multiple ad hoc strategies with the goal of avoiding and/or “dealing with” the 
undesirable consequences of index bias. 

A few methodological studies have advocated dealing with bias directly at the 
point of measurement by adjusting observed scores to remove the impact of bias and 
obtain unbiased scores (e.g., Winship 1977; Carrington and Troske 1997; Allen 
et al. 2009; Mazza and Punzo 2015). To date, however, few researchers have 
embraced such strategies. The main reason for this appears to be that the resulting 
index scores are complicated to explain and interpret and the best approaches to 
implementing the adjustments are technically and computationally demanding. 

What most researchers do instead is adopt “indirect” rather than “direct” 
approaches to dealing with index bias. That is, they measure segregation using 
“standard” (i.e., biased) versions of indices and then they adopt a variety of strate- 
gies to cope with the problem that scores may be differentially distorted by bias. 
Unfortunately, the strategies researchers use are a patchwork of informal, ad hoc 
practices. They are well-intentioned, but they are subject to criticism on multiple 
counts. The most important criticism is that the prevailing practices do not directly 
deal with index bias at the point of measurement for individual cases. Consequently, 
index scores for individual cases that are suspect of being distorted by bias are never 
“corrected” and in most studies these cases are not even identified. Consequently, 
index scores for individual cases affected by bias remain untrustworthy and cannot 
be safely used for even elementary descriptive tasks such as: assessing the level of 
segregation for individual cities on a case-by-case basis, making direct comparisons 
of segregation between any two cases, assessing differences in segregation between 
different group comparisons for a single city, or following a single case over time. 

There is no sugar-coating the current situation. Prevailing practices for dealing 
with bias do not yield trustworthy segregation index scores for individual cases. At 
one level this is not surprising because the strategies researchers use to cope with 
index bias do not adopt the goal of obtaining trustworthy index scores for individual 
cases that have only a negligible amount of bias (e.g., less than 2 points). They 
instead employ a two-pronged strategy. They first try to screen out cases most likely 
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to be distorted by severe levels of bias. They then try to “work around” the problem 
of moderate levels of bias for many of the “surviving” cases. The main strategies 
researchers use in pursuing this approach are informal “rule-of-thumb” practices for 
screening cases from the analysis and/or minimizing the undesirable consequences 
of cases where bias is likely to be a non-trivial concern. Common strategies for deal- 
ing with bias include the following: 


e assess segregation using larger spatial units such as census tracts instead of 
smaller units such as blocks; 

e focus on comparisons of broad group populations and avoid comparisons involv- 
ing smaller subgroups within populations — for example, compare all Whites 
with all Blacks instead of comparing low-income Whites with low-income 
Blacks; 

e apply a variety of ad hoc sample restrictions to exclude potentially problematic 
cases in the full data set from the subset of cases used for the final analyses; and 

e weight cases in the analysis data set differentially in hopes of minimizing the 
influence that potentially problematic cases may exert on results. 


These strategies and ones similar to them are widely used primarily because they 
are easy to implement. More rigorous alternative approaches are available but are 
rarely adopted due partly because they are less well known but also because they are 
more complex and demanding. I view the current state of affairs with concern. First, 
as I noted above, the practices researchers use do not improve the measurement of 
index scores at the level of individual cases. Second, the “protective” practices are 
applied inconsistently and in patchwork fashion. Third, there is little formal meth- 
odological work to show that the practices being used are in fact effective in elimi- 
nating and/or minimizing the undesirable impact of untrustworthy index scores. 

Finally, and perhaps most importantly, I worry that the “cures” adopted for deal- 
ing with index bias have undesirable side effects that in some cases may be “as bad 
as the disease.” In particular, prevailing practices restrict the scope of segregation 
studies and constrain research designs in nonrandom and ultimately undesirable 
ways. They shift study designs toward investigating a narrower set of questions that 
can be addressed using a smaller subset of cases and group comparisons where 
standard index scores are viewed as more trustworthy. 

Obviously, this is not the situation researchers want. They would prefer to have 
trustworthy index scores for as many cases as possible and for as wide a range of 
group comparisons and research situations as possible. Happily, the difference of 
means framework I introduce in this monograph makes it possible to take a major 
step toward this goal. Working from within this framework I am able to develop 
refined versions of widely used indices of uneven distribution to correct the problem 
of index bias directly at the point of measurement. The new measures are attractive 
on several counts. First, they are not exotic or dramatically different. They are 
refined versions of popular indices and researchers do not have to adopt unfamiliar 
approaches to measuring uneven distribution. Second, the refinements that yield 
unbiased versions of indices involve minor adjustments in index calculations that 
are simple and easy to implement but yet very effective in providing robust protection 


214 14 Index Bias and Current Practices 


against index bias over a broad range of conditions and group comparisons. Third, 
the technical basis for achieving unbiased index scores allows researchers to con- 
tinue to invoke familiar substantive interpretations of popular indices with only 
subtle changes. Finally, the new measures can be used at little cost or risk. When 
bias in fact is negligible, as sometimes is the case, scores of unbiased versions of 
indices track scores of standard versions very closely and the two versions will yield 
essentially identical results. The scores for standard and unbiased versions of indi- 
ces differ only when bias is non-negligible and scores for standard versions of indi- 
ces do not yield trustworthy assessments of uneven distribution. 

Based on these points I suggest that the unbiased versions of indices that I intro- 
duce in this monograph provide valuable new alternatives for research. They can be 
used interchangeably with standard versions of indices in any situation where stan- 
dard index scores can be trusted and results will be the same. But, more importantly, 
the unbiased versions can be used in many additional situations where standard 
indices cannot be safely used. Thus, the unbiased versions of index scores I intro- 
duce here expand the potential scope of segregation studies to include group com- 
parisons and study situations that researchers currently would avoid. 

I devote the remainder of this chapter to the task of “setting the stage” for intro- 
ducing the unbiased versions of popular indices. I serve this goal by first reviewing 
the general problem of index bias. I then review the prevailing practices researchers 
use to try to minimize the undesirable effects of index bias and note my concerns 
about these practices focusing on technical questions of their efficacy considered 
narrowly and also on the insidious impact of these practices on segregation research 
more broadly. Finally, I review options that have been previously suggested for how 
bias might be addressed directly at the point of measurement and consider why they 
have not gained wider adoption. The existence of this chapter indicates that I believe 
it is worthwhile to review these topics in some detail. However, I will not be sur- 
prised and will take no offense if some readers choose to skip forward to Chap. 15 
where I outline the basis for formulating unbiased versions of popular indices and 
Chap. 16 where I review their behavior in empirical applications. I turn now to 
reviewing basic issues and current practices. 


14.1 Overview of the Issue of Index Bias 


The dissimilarity index (D) is the most widely used measure of uneven distribution 
so it comes as no surprise that it has received especially close scrutiny on the issue 
of index bias. Taeuber and Taeuber (1965) provided a thoughtful early discussion of 
the issue in their appendix chapter reviewing issues in segregation measurement. 
They noted that zero, the value of D that signals integration conceived as exact even 
distribution, does not obtain under random distribution and furthermore is usually 
logically impossible even under strategic, purposive assignment because 
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individuals and households cannot be assigned in fractional parts (1965: 231—235).! 
Later methodological studies characterized D’s positive expected value under ran- 
dom assignment (i.e., E[D] >0) as “bias” and raised awareness that bias in D varies 
in complex ways that can make scores for D problematic in many situations (e.g., 
Cortese et al. 1976; Winship 1977). The issue has now received regular attention for 
four decades and a large literature has grown with contributions from many meth- 
odological studies that have considered the nature of index bias, its practical conse- 
quences, and possible approaches for diagnosing and dealing with it (e.g., Taeuber 
and Taeuber 1976; Cortese et al. 1976, 1978; Blau 1977; Winship 1977, 1978; 
Massey 1978; Falk et al. 1978; Farley and Johnson 1985; Boisso et al. 1994; 
Carrington and Troske 1997; Ransom 2000; Allen et al. 2009; Mazza and Punzo 
2015). 

Consensus exists on many important points relating to certain technical aspects 
of index bias. Several key understandings trace to Winship’s (1977) influential early 
analysis of the bias behavior of D and S. Of particular note, Winship introduced two 
analytic formulas for calculating the expected value of D (denoted by E[D]) under 
random distribution. Both formulas are based on a formal model of random distribu- 
tion of households from two groups over areas of constant population size (t;). He 
termed one formula “exact” because it implements detailed calculations based on 
the binomial probability distribution and can be applied at both small and large 
values of area population size. He termed the other formula an “approximation” 
because it draws on simpler calculations that yield satisfactory results when area 
population size is not small (i.e., when t; 225). Examining the approximation for- 
mula, E[D]= 1/,/27t,PQ , clarifies how E[D] varies over study design and demo- 
graphic conditions. Specifically, it reveals that two terms —area population size (t;) 
and the relative size of the reference group (P) — determine how the value of E[D] 
varies with city racial composition and with study design (1.e., the size of spatial 
units used in assessing segregation). 

The first term, the area pairwise population count (t;), has an inverse relationship 
with E[D]; all else equal, E[D] declines as t; increases. This relationship can provide 
a rationale for why research moved from once commonly assessing segregation 
using small areas such as blocks to more often using larger areas such as census 
tracts. It also provides a rationale for avoiding group comparisons which involve 
small combined populations. The practice provides a measure of protection against 
index bias, but this comes with substantial costs. It eliminates the option of investi- 
gating segregation in smaller cities and communities where tracts are too big to 
capture segregation patterns. It also eliminates the option of studying segregation 
involving small groups and subpopulations. 


! The latter point is not widely appreciated but deserves greater attention because individuals typi- 
cally are imbedded in a family or household that sociologically cannot be viewed as divisible and 
in most cases will be racially homogenous. Accordingly, Winship (1977) advocates assessing seg- 
regation for households rather than individuals because individuals within households are not “‘sta- 
tistically independent”. 
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The second key term in Winship’s approximation formula for E[D] is PQ, the 
product of group population proportions. The value of this term is controlled by P — 
the pairwise proportion of the reference group in the combined city-wide population 
of the two groups. P in turn determines Q, the pairwise proportion of the compari- 
son group, based on Q=1-—P, and so also determines the value of PQ. The value of 
PQ has an inverse relationship with E[D]; all else equal, E[D] is lower when PQ is 
higher. The maximum for PQ occurs when the two groups are equal in size 
(P= Q=0.5). So bias in D (E[D]) grows larger as groups become more imbalanced 
in size (i.e., as P departs from 0.5). This relationship can provide a rationale for 
excluding cases from analysis when one group in the segregation comparison is 
small in relative size. Again, the practice provides protection against index bias, but 
it comes at a cost; it eliminates the option of investigating segregation in communi- 
ties where groups are imbalanced in size. Thus, for example, it precludes the pos- 
sibility of investigating segregation in the initial stages of a new group’s entry into 
a residential system since group size will in most cases be highly imbalanced. 

Winship assessed the impact of area population size (t;) and city racial composi- 
tion (P) on index bias (E[D]) by tabulating the values of E[D] obtained from ana- 
lytic formulas over varying combinations of t; and P. The results he reported showed 
that area size and city racial composition have complex, non-linear, non-additive 
effects on E[D]. Later studies confirm his findings with similar results obtained by 
analytic and simulation exercises investigating the issue of index bias (e.g., 
Carrington and Troske 1997; Allen et al. 2009; Mazza and Punzo 2015). I summa- 
rize the most important findings of these studies as follows. 


e D is subject to bias under all conditions; that is, the expected value of D under 
random distribution always is greater than zero (e.g., E[D] >0). 

Significantly, the positive value of E[D] truncates the range of D in empirical 
analyses by setting a “floor” for the minimum value below which D is unlikely 
to fall in the absence of exceptional circumstances (e.g., assignment of individu- 
als and households by quota and in fractional parts). 

e In many situations the index value of 0, which obtains only under exact even 
distribution, is not logically possible due to the integer nature of population 
counts and the non-independence of individuals in families and households.” 

e The magnitude of bias for D varies inversely with the population size of areal 
units (t)). 

Other things equal, E[D] grows smaller as area population size grows larger; 
it moves toward being negligible when area population size is very large. 

e The magnitude of bias varies inversely with pairwise balance in city racial 
composition. 


? By non-independence of individuals in families and households I mean that attaining exact even 
distribution would require that individuals living together within families and households would be 
separated and distributed independently across areas. 
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e Other things equal, E[D] grows larger as city racial composition becomes more 
imbalanced. More exactly, E[D] is lowest when P = Q =0.5 and increases at an 
increasing rate as P departs further from 0.5. 

e The joint impact of area population size (t;) and city racial composition (P) on the 
magnitude of bias is complex. Specifically, the effects of each factor are non- 
additive and nonlinear such that a bias-promoting change in one factor amplifies 
the other factor’s impact on bias. 


The most important conclusion to be drawn from these studies is more general 
and deserves to be separated from the others. 


Bias can be non-trivial in magnitude in many cases and it can vary greatly in mag- 
nitude from case to case including different cities, different group comparisons, 
or a given city-group comparison tracked over time. 

Consequently, bias can complicate measurement and potentially lead researchers to 
draw incorrect conclusions about the levels and patterns of variation in uneven 
distribution across group comparison, across cities, and over time. 


Significantly, all of the points just listed apply to all popular indices of uneven 
distribution except one. More specifically, the points listed above apply to the gini 
index (G), the Atkinson index (A), the Hutchens square root index (R), and the Theil 
entropy index (H). One popular measure — the separation index (S) — is an excep- 
tion; index bias is less of a problem for this index than for any other widely used 
index of uneven distribution. 

Bias for S is smaller in magnitude than for any other popular index. In addition, 
variation in bias for S across cases is less complicated than for any other popular 
index. The major reason for this is that bias for S is determined by just one factor — 
area population size (t;) — with E[S] being given by the simple calculation E [s] =1/t; 
(Winship 1977). Thus, in contrast to other indices, bias for S does not vary with city 
racial composition (P). Accordingly, analyses reported in Chap. 16 show that the 
separation index (S) exhibits a lower level of bias than other indices under all condi- 
tions and especially when city racial composition is imbalanced. Indeed, the levels 
of bias for the separation index (S) are so much lower and so much less complicated 
than for other indices, this alone could be a compelling reason to always consider 
using S in empirical analyses. That said, E[S] is never zero and bias can render 
scores for S problematic in some extreme circumstances. Consequently, while using 
S to measure uneven distribution can go a long way to protecting against the poten- 
tial distorting impact of index bias, using S cannot in itself guarantee that bias does 
not adversely affect index scores. 
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14.1.1 Effective Neighborhood Size (ENS): A Further 
Complication 


Previous methodological studies provide valuable insights about the nature of index 
bias. Unfortunately, however, these insights do not necessarily provide an adequate 
basis for diagnosing the presence of bias in empirical studies. The reason for this is 
that the expected values of index scores (i.e., E[*]) under random assignment are 
more complicated in empirical studies than in analytic studies. Three factors pose 
difficulties for researchers seeking to assess and deal with index bias in empirical 
studies. 


e Neighborhood size often varies substantially across spatial units. 

e The non-negligible presence of other groups not included in the segregation 
comparison often varies markedly across cases. 

e The extent to which other groups not included in the segregation comparison co- 
reside with the two groups in the comparison often varies across cases. 


Each of these three factors complicates bias because they affect the value of t; 
which, as noted above, plays a central role in determining the expected values of 
indices under random assignment (i.e., E[*]). In empirical studies area population 
size (ti) can be highly variable and this makes its impact on E[*] more difficult to 
establish. As a rule of thumb, t, varies in predictable ways across the kind of areal 
units used in measuring segregation. For example, t; is lower when using census 
blocks compared to census tracts. So, all else equal, one can safely expect bias will 
be a greater concern for blocks than for tracts. But there is a further complication in 
empirical studies; the population size of the areal unit used (e.g., tracts) can vary 
considerably across units. 

The exact impact of variation in area size (t;) on bias can be complicated to assess 
for a given case. But it is easy to grasp that it can be important because empirical 
distributions of population counts for areas often span a wide range and tend to be 
skewed right with unusual outliers. Variation in area population size occurs for 
many reasons including: differences between areas with high-density apartment 
buildings vs. areas with low-density, single-family detached housing; the presence 
of non-institutional group quarters such as work camps, college dorms, and military 
barracks, convents, etc.; and the presence of institutional group quarters such as 
prisons, facilities for the elderly and disabled, and other institutions. As a result, it 
can be inappropriate to use a single value of area population size (t,) when estimat- 
ing E[D] by analytic formulas. As an alternative, one could extend the formulas for 
E[D] to take account of variation in area size. Another alternative is to adopt 
computation-intensive methods such as estimating the sampling distribution of 
E[D] under random distribution using city- and comparison-specific bootstrap sim- 
ulations as advocated by Carrington and Troske (1997) and Allen et al. (2009). 
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Unfortunately, all of these options introduce complexity and substantial computa- 
tional burdens and so are unlikely to be widely adopted by researchers.? 

The next complication arises when other groups not in the segregation compari- 
son are present in the city population. To see this, first note that, strictly speaking, it 
is not area population size per se that is relevant to index bias; it is the “pairwise” 
population count in the area. In view of this I introduce the term “effective neigh- 
borhood size” (ENS) to refer to the value of the combined population counts for the 
two groups in the comparison in the areal unit. The value of effective neighborhood 
size (ENS) sometimes corresponds to the value of area population size, but ENS is 
conceptually distinct and can depart from overall area population size. Indeed, ENS 
can take dramatically different values from overall area population size when the 
combined relative size of other groups not in the segregation comparison is large.* 

Effective neighborhood size (ENS) equals area population size (t) only when the 
city population consists of just the two groups in the segregation comparison. This 
situation is often assumed in methodological studies to simplify analysis, but the 
assumption is untenable in empirical studies where the presence of other groups in 
the population can cause the value of effective neighborhood size (ENS) to depart 
dramatically from overall area population size. Under random distribution for all 
groups ENS will be smaller than area population size and estimates of index bias 
based on overall area population size will be too low. This can cause commonly 
used “rules-of-thumb” for protecting against bias to fail. For example, researchers 
may use census tracts as the spatial units for assessing segregation in hopes that bias 
will be negligible because tract populations are large. But ENS can still be low even 
when using census tracts if the two groups in the segregation comparison are both 
small. For example, this might occur when investigating the segregation of Asian 
subgroups (e.g., the Chinese and Korean subpopulations) or when investigating seg- 
regation across income subgroups (e.g., Whites and Blacks in the top quintile or 
decile of the distribution of household income). 

In simple situations one could replace the value of area population size with a 
smaller value of ENS by multiplying average area population size by the propor- 
tionate representation of the two groups in the comparison in the total population.° 
Unfortunately, this is inadequate because the value of effective neighborhood size 
(ENS) is affected by another complicating factor; namely, the extent to which the 
other groups in the city population co-reside with the two groups in the segregation 
comparison. If the other groups co-reside extensively with the two groups in the 
comparison (as would be the case under random distribution of all groups), ENS 
will be smaller than area population size (ti) and approach its minimum possible 
value. All else equal, index bias would then be higher. But, if the other groups in the 


3 Analytic techniques advocated by Mazza and Punto (2015) may help reduce the computational 
burden. But the task will still likely be too complex for wide adoption in empirical studies. 

+The impact of this factor has not previously been carefully studied. I provide analyses in the next 
chapter that show how it can have important impacts on E[D] and how these impacts can make 
previous strategies for dealing with index bias (e.g., Winship 1977) less effective. 

>For the moment I set aside the factor of variation in area size. 
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population are completely segregated from the two groups in the comparison, the 
two groups of interest will be the only groups present in the areas where they reside. 
In this situation the value of ENS then will take its maximum possible value and 
match area population size. All else equal, index bias would then be lower. The 
“correct” value of ENS in empirical analyses will typically fall somewhere between 
these minimum and maximum values depending on whether the other groups in the 
city population are weakly or strongly segregated from one or both of the groups in 
the segregation comparison. Since multi-group distributions vary widely in real cit- 
ies, this issue carries complex implications for index bias and greatly complicates 
the assessments of index bias across group comparisons and cities. 

In sum, the distinction between overall area size and effective neighborhood size 
(ENS) and the other complications noted above can have important practical impli- 
cations for assessing index bias. When ENS is known with precision, analytic for- 
mulas for calculating expected values of bias (e.g., E[D]) can potentially provide a 
reasonable guide to identifying when bias is negligible or problematic. When one or 
more of the complications noted in the above discussion are present, the same for- 
mulas can yield incorrect expected values of bias. Previous methodological studies 
have not recognized this problem. As a result, strategies for dealing with bias that 
rely on estimating expected values of index scores under random distribution (E[*]) 
can perform poorly in empirical studies. 


14.1.2 The Practical Relevance of Variation in Effective 
Neighborhood Size 


In the face of these complications, one option is to estimate values of E[*] by boot- 
strap simulation methods (per Carrington and Troske 1997; Allen et al. 2009, and 
Mazza and Punzo 2015). In principle, applying these methods with observed resi- 
dential distributions can yield superior results of E[*] because the estimates do not 
depend on simplifying assumptions about the value of effective neighborhood size 
(ENS). 

I explored using this option by examining expected values the dissimilarity index 
(D) for block-level segregation between Whites and Blacks for CBSAs in 2000. For 
this analysis I computed values of E[D] by three methods. First I computed two 
values of E[D] using Winship’s (1977) “approximation” and “exact” formulas. To 
establish the value of ENS to use in the formulas, I calculated the median value of 
ENS over blocks in the CBSA that had nonzero counts for the combined White and 
Black population. I additionally computed values of E[D] based on bootstrap simu- 
lations that do not make simplifying assumptions about ENS.° 


ê Specifically, I estimated E[D] by distributing White and Black individuals randomly over areas 
where Whites and Blacks were found in the observed residential distribution for the city and cal- 
culating D from the resulting random distribution. I repeated the exercise 1000 times for each 
CBSA and took the average of the obtained values of D as the estimate of E[D] for the CBSA. 
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I found that the values of E[D] based on the three methods were highly correlated 
(r?>0.95). But, importantly, they were not exact substitutes for one another and the 
differences often had important consequences. First, while values of E[D] corre- 
lated across methods, the average values of E[D] varied by method. Values of E[D] 
based on Winship’s approximation formula were much higher than those based on 
the exact formula (consistent with results presented in Winship (1977)). Second, 
values of E[D] based on bootstrap simulation methods were lower than values 
obtained using analytic formulas. Specifically, estimates of E[D] based on the 
Winship’s exact formula were on average 40 % higher than estimates from bootstrap 
simulations. This indicates that assessment of bias using analytic formulas will be 
too high and adjustments of index scores using estimates of bias based on analytic 
formulas would tend to significantly “over-correct” and yield estimates of unbiased 
segregation that are too low. 

I conducted similar exercises for other popular indices of uneven distribution; 
specifically, G, R, H, and S. The results for these indices were similar to what I just 
described for D. Estimates of bias obtained by analytic formula were higher than 
estimates based on bootstrap simulation methods. The key point for present con- 
cerns is that the magnitude of estimates of E[*] varies by method. This indicates that 
estimating expected values of index scores under random distribution is not a simple 
task in empirical studies. For now it appears that the most accurate alternative is to 
use the computationally demanding method of bootstrapping (per Carrington and 
Troske (1997) and Allen et al. (2009)) to obtain estimates of expected values (E[*]) 
of measures of uneven distribution. The estimates are superior because they do not 
rely on strong assumptions (i.e., that areas are all the same size and that effective 
neighborhood size is constant across areas) but instead directly incorporate the 
observed variation in ENS across areas. Unfortunately, the practical burdens associ- 
ated with this approach will deter most researchers from adopting the methods. 


14.1.3 Random Distribution Is a Valid, Useful, 
and Conceptually Desirable Reference Point 


The literature on segregation measurement includes many statements noting that 
random distribution can serve as a valid and desirable reference point for assessing 
segregation (e.g., Jahn et al. 1947; Reiner 1972; Zelder 1972; Cortese et al. 1976; 
Winship 1977; Blau 1977; Boisso et al. 1994; Carrington and Troske 1995; 
Carrington and Troske 1997; Ransom 2000; Allen et al. 2009; Mazza and Punzo 
2015). For example, Cortese, Falk, and Cohen offer the succinct argument that it is 
“natural” to “construct an index which takes a value of zero when the distribution is 
random” (1976: 631). The unbiased measures suggested by Winship (1977), 
Carrington and Troske 1995, Carrington and Troske (1997), Allen et al. (2009), and 
Mazza and Punzo (2015) all have this property. The measures I introduce in Chap. 
15 also have this property. 
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One obvious benefit is that when indices have this property the value of zero can 
then serve as the reference point for evaluating whether the index value obtained 
indicates that race or other group membership plays a role in segregation over and 
above the consequences of chance. Using indices with this quality would bring seg- 
regation research into conformity with long-standing convention in the study of 
group disparities in socioeconomic outcomes. Inequality research in all domains 
except the study of residential segregation evaluate group disparities on socioeco- 
nomic outcomes (e.g., education, occupational status, income, etc.) based on com- 
parisons of group means that take expected values of zero when group membership 
(i.e., race) has no statistical association with the stratification outcome in question. 

No significant objection has been or can be raised against the goal of seeking 
“unbiased” segregation indices with these properties. Taeuber and Taeuber (1976) 
and Winship (1977) have correctly noted that segregation resulting from random 
factors can be substantively meaningful in its own right. But this of course does not 
undercut the desirability of having unbiased indices whose scores provide a trust- 
worthy signal that segregation departs from levels expected under random distribu- 
tion. Winship argues that measures possessing this quality are especially desirable 
when interest is focused on the causes of segregation rather than its consequences 
(1977: 1065). Moreover, even when one is interested in the consequences of segre- 
gation, it can be valuable to know whether the segregation involved reflects system- 
atic social dynamics, stochastic variation in residential distributions, or artifactual 
components of index values. 


14.2 Prevailing Practices for Avoiding Complications 
Associated with Index Bias 


I noted at the beginning of this chapter that most segregation researchers are aware 
of the problem of index bias and based on concern about this potential problem they 
routinely adopt strategies to minimize its undesirable consequences. This represents 
a practical compromise between the ideal of assessing and dealing with bias directly 
at the point of measurement — which until now has not been possible — and forego- 
ing segregation research altogether. Researchers thus face the dilemma that segrega- 
tion is an important social phenomenon that warrants sustained investigation but 
methodological studies establish that bias can distort segregation index scores and 
have adverse impacts on results and findings. Because direct solutions to this prob- 
lem have not been available, researchers have adopted two general approaches for 
coping with concerns about index bias. One is to identify and avoid using especially 
problematic cases. The other is to differentially weight cases to try to minimize the 
impact of problematic cases. 

Surprisingly, researchers almost never use direct methods of assessing bias to 
identify potentially problematic cases. This is difficult to understand and raises the 
question of why researchers use inferior proxy approaches instead of more rigorous 
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methods. Computation intensive bootstrap methods — which arguably yield the best 
estimates of E[*] — are relatively new and arguably are too demanding for general 
use. But analytic methods for assessing E[*] set forth in Winship (1977) have rigor- 
ous foundations and are easy to implement. It would seem that these methods pro- 
vide an obvious and compelling option for identifying segregation comparisons that 
are most likely to be distorted by bias. Nevertheless, researchers instead rely on 
informal “rules of thumb” to screen cases. These informal methods tend to be crude 
and imprecise in comparison to available analytic methods for directly evaluating 
E[*]. Common examples include the following practices. 


e Restrict segregation studies to comparisons involving broad population groups; 
avoid comparisons involving small populations or subgroups within broader 
populations. 

e Assess segregation using larger spatial units such as census tracts; avoid smaller 
spatial units such as census blocks or census block groups. 

e Restrict segregation studies to only comparisons where group ratios are rela- 
tively balanced and avoid comparisons where group ratios are highly 
unbalanced. 

e Assess segregation using full count (100 %) data; avoid sample data. 

e Weight cases differentially — discounting cases presumed to be distorted by 
bias — when performing statistical analyses assessing variation in segregation 
over time or across groupings of cases and when performing regression analyses 
investigating cross-area variation in segregation. 


The practices just listed are not necessarily all implemented in every study and the 
individual practices are not always implemented in exactly the same way. But 
almost all empirical studies adopt some combination of multiple practices similar to 
the ones listed above. The best justification one can offer for these “rule-of-thumb” 
practices for dealing with index bias is that, while they are not necessarily optimal, 
they are easy to implement and may be useful. 


14.2.1 Unwelcome Consequences of Prevailing Practices 


Researchers adopt the practices just described with the best of intentions and the 
practices probably do provide a measure of protection from situations where unde- 
sirable consequences of index bias are especially great. My concern is that segrega- 
tion studies rely too heavily and uncritically on these informal practices. One basis 
for my concern can be expressed in the simple question, “Is there compelling evi- 
dence to indicate that the practices are effective in accomplishing the intended goal 
of eliminating undesirable impacts of index bias?” Unfortunately, the answer is “no, 
not really.” The practices are appropriately characterized as rough-and-ready “rules- 
of-thumb” whose efficacy has not been established by rigorous methodological 
studies. 
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I comment on these issues further in the next section to explain the points more 
carefully. But I should note here that I see these issues as secondary because it is 
easy to imagine substituting better practices. The more serious concern is that even 
if these prevailing practices for dealing with the problems associated with index 
bias are refined to work as well as possible they still have the undesirable conse- 
quence of restricting the scope of segregation studies. This issue is insidious because 
it is less obviously “visible.” But its impact on segregation research is substantial 
and far reaching. 

Importantly, this undesirable consequence is not reduced when one adopts more 
rigorous practices for diagnosing situations where index bias is likely to be prob- 
lematic. The practices researchers adopt to avoid problems associated with index 
bias make it impossible to conduct many studies that researchers would otherwise 
undertake if index bias were not a concern. The following is a list of research topics 
that are of clear scientific interest but currently are “off limits” because prevailing 
practices for dealing with index bias will preclude analyses that could address ques- 
tions relating to these topics. 


e studying segregation at finer levels of neighborhood resolution such as using 
small spatial units such as census blocks, 

e studying segregation in smaller metropolitan areas and non-metropolitan areas 
(because segregation in these areas can only be captured well using smaller spa- 
tial units such as blocks), 

e studying segregation involving populations that are small in absolute size such as 
Asian and Latino subgroups (e.g., Vietnamese or Salvadoran) or “first settler” 
and early arriving’ Latino and Asian populations in new destination 
communities, 

e studying segregation between population subgroups based on social characteris- 
tics such as education, income, family/household type, or other similar charac- 
teristics, especially considered in combination, and 

e studying segregation involving groups that differ substantially in relative size. 


As the situation currently stands, these and many other kinds of studies are pre- 
cluded due to researchers’ concerns that index scores obtained for the comparisons 
involved cannot be trusted. The undesirable consequence of this is that the research 
literature is severely skewed toward examining a narrow subset of segregation com- 
parisons that survive a gauntlet of restrictions placed on group comparisons, analy- 
sis samples, and study design (e.g., size of spatial unit). Accordingly, most empirical 
studies of segregation in the contemporary literature focus on tract-level segregation 
for large metropolitan areas and on group comparisons involving minority popula- 
tions that are large in terms of both absolute and relative group size. Of course these 
cases are important and sociologically interesting in their own right. But researchers 
should not lose sight of the fact that this is a narrow subset of cases and is not rep- 
resentative of the full range of situations and group comparisons that research would 
consider if study designs were not narrowly restricted to reduce concerns about 
index bias. 
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This raises the concern that our understanding of segregation patterns is based on 
a particular subset of cases and comparisons chosen for practical, not theoretical 
and substantive, reasons. Equally importantly, it raises the related concern that 
researchers cannot undertake studies of segregation in many situations that have 
potentially important value for understanding segregation dynamics. For example, 
it is of obvious scientific interest to study the trajectory of segregation over time for 
new immigrant populations. But this currently is not possible because prevailing 
restrictions on study designs preclude the possibility of assessing segregation in the 
early stages of this process when the group is small in both absolute and relative 
size. 

In some areas of inquiry the impact of concerns about index bias on the scope of 
segregation studies is pervasive and near-total. One example of this is the near total 
disappearance from the literature of studies that assess segregation at smaller spatial 
scales. Analysis of segregation based on block-level data once was common 
(Taeuber 1964; Taeuber and Taeuber 1965; Sorenson et al. 1975; Schnore and 
Evenson 1966; Farley and Taeuber 1968, 1974; Roof and Van Valey 1972; Van Valey 
and Roof 1976). Nowadays it is rare. 

This change in the literature is not based on theoretical or substantive concerns. 
To the contrary, assessing segregation at small spatial scales has obvious substantive 
value because it can potentially detect segregation that might otherwise be missed. 
Accordingly, block-level analysis is better suited for studying the emergence of 
segregation patterns for newly arriving migrant or immigrant populations because 
patterns of segregation during their initial settlement would not be evident if segre- 
gation is measured using larger units such as census tracts.” Similarly, block data are 
relevant for nonmetropolitan areas and non-core counties where census tracts are 
too large to sustain meaningful segregation analysis. But contemporary empirical 
studies rarely investigate segregation using block data. It is not because segregation 
in these settings just mentioned is substantively unimportant or scientifically unin- 
teresting. Instead, it is because segregation study designs have “retreated” to sup- 
posedly safer ground to avoid the complications of index bias that arise when 
measuring segregation based on small areas. The unfortunate byproduct of this is 
that it has inhibited the investigation of segregation in smaller cities and 
communities. 

Another closely related example is that empirical segregation studies systemati- 
cally avoid examining segregation in metropolitan areas where one of the popula- 
tions in the analysis is a relatively small proportion of the population or is small in 
absolute population size. For example, Farley and Frey’s (1994) influential study of 
trends in segregation from Whites for Blacks, Latinos, and Asians restricted its anal- 
ysis to metropolitan areas where the minority group in the comparison either reached 
20,000 in overall population or represented 3 % or more of the city population. As a 
result, out of 318 total metropolitan areas, their analysis included only 232 areas for 


‘Lichter et al. (2010) used block-level data to study White-Latino segregation in new destinations. 
They offered compelling arguments for why block level data was necessary. But they did not 
address the problem of index bias. 
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White-Black segregation, only 153 areas for White-Latino segregation, and only 66 
areas for White-Asian segregation. 

The metropolitan areas excluded from comparison were those for which the 
minority group was small in relative and/or absolute size. Many of the excluded 
cases have non-negligible populations for the groups in question ideally would be 
included in studies investigating how segregation varies with basic factors such as 
size of city, relative group size, and trends in absolute and relative group size. 
However, since prevailing practices exclude cases over key ranges of these vari- 
ables, many interesting research questions cannot be addressed. 

Similar consequences are seen in studies of segregation among subgroups within 
various minority populations. For example, in a study of segregation patterns for 
five Asian-origin groups (Chinese, Japanese, Korean, Vietnamese, and Asian 
Indian), Massey and Denton (1992) restricted their analysis to metropolitan areas 
where the size of the Asian-origin group in question was 5,000 or higher. This lim- 
ited the scope of their analysis to no more than 11 metropolitan areas for any single 
group. In addition, they reported segregation scores only for group comparisons 
where both groups in the segregation comparison had 5,000 persons and this elimi- 
nated 20-30 % of possible comparisons involving other Asian-origin groups. They 
explicitly justified these restrictions in terms of concerns about index bias stating 
“Since the index of dissimilarity is inflated by random variation when group sizes 
get small (Massey 1978), we only compute indices when the group size in the 
SMSA exceeds 5,000” (Massey and Denton 1992: 171). Massey and Denton are 
clear that they did not adopt these restrictions on study design based on theoretical 
interest or other substantive concern but rather adopted the restrictions solely as a 
means of guarding against adverse consequences of index bias. 

A final example I note is the impact on research examining racial segregation 
between racial groups after they have been secondarily grouped on socioeconomic 
status or other social characteristics relevant for group differences in residential 
distributions. Empirical investigations of this type routinely limit their analyses to a 
handful of very large cities. Furthermore, to proceed with analysis in this small 
subsample of cities they collapse the detailed data on socioeconomic characteristics 
(e.g., income) into a small number of broad groupings (e.g., 3-5 categories). Again, 
these restrictions in study design are adopted primarily to avoid complications asso- 
ciated with index bias. Evidence of this is found in the following statements from 
two important studies investigating racial-ethnic segregation across socioeconomic 
standing. 


Since the number of minority members is small in some socioeconomic categories, particu- 
larly those at the upper end of the socioeconomic spectrum, we focus attention on three sets 
of 20 SMSAs that have the largest numbers of blacks, Hispanics, and Asians ... Focusing 
on the top 20 SMSAs for each group maximizes the number of minority members within 
each socioeconomic category and increases the stability of the segregation indices. (Denton 
and Massey 1988: 799-800) 
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Since dissimilarity indices become unreliable and difficult to interpret when the number of 
minority members is very small (Massey 1978), we only compute figures for those metro- 
politan areas where the minority population reached 5,000. Massey and Fischer (1999: 318) 


The several examples reviewed above illustrate that empirical studies of segrega- 
tion routinely adopt restrictions on study designs to avoid situations where index 
bias can complicate assessments of the level of segregation and its variation across 
cases. In the absence of better alternatives for dealing with index bias, these prac- 
tices can perhaps be seen as necessary precautions. Nevertheless, it is important to 
recognize that the practices have many unwelcome consequences and it would be 
more desirable to have unbiased versions of indices of uneven distribution so the 
current restrictions on the scope of segregation studies can relaxed. 


14.2.2 Efficacy of Prevailing Practices: Screening Cases 
on Minority Population Size 


In the ideal, the practices researchers adopt to minimize complications associated 
with index bias would have clear rationales and be established as effective by rigor- 
ous methodological studies. One approach would be to identify potentially prob- 
lematic cases by using either analytic formulas (Winship 1977) or bootstrap methods 
(e.g., Carrington and Troske 1997; Allen et al. 2009; Mazza and Punzo 2015). For 
example, one might require that expected values of E[D] be below some value 
deemed “acceptable” — say 3—5 points. But empirical studies of segregation do not 
screen cases this way nor do they report the levels and ranges of E[D] for the cases 
in the analysis sample. 

Instead, empirical studies rely on informal practices such as screening cases 
based on “thresholds” on absolute and relative group size. The potential concern is 
that this is an imprecise way to screen problem cases. I explored the issue empiri- 
cally using a data set with observations on White-Minority segregation for CBSAs 
in 1990, 2000, and 2010. I screened cases requiring that each case have at least 
2,500 persons in both groups in the decade of observation and with the smaller 
group in the comparison comprising at least 3% of the combined group total.® 
Screening criteria similar to these are routine in empirical studies. Their application 
here yielded an analysis data set with 3,570 cases. 

This result itself deserves comment. Relaxing the case selection criteria to 
require cases to have only 500 persons and for the smaller group in the comparison 
to comprise only at least one-half of one percent of the combined group total would 
yield 6,655 cases. The additional 3,085 cases would be highly relevant for assessing 
how segregation compares in smaller communities and communities where one 


ë The data set included White-Black, White-Latino, and White-Asian comparisons. 
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group in the comparison is small in relative size. This could apply, for example, to 
establishing “baselines” for White-Latino segregation in micropolitan areas and 
non-core counties of the Midwest and South that emerged as new destination com- 
munities experiencing Latino population growth during the period 1980-2000. 
Current practices do not permit these cases to be considered. The unbiased indices 
I introduce in Chap. 15 make it possible for researchers to focus on these communi- 
ties using spatial units as small as blocks (instead of tracts) if they wish to do so. 

For each segregation comparison I calculated the value of D and estimates of bias 
based on values of E[D] obtained using both Winship’s analytic formulas and also 
by bootstrap methods. The question I address is whether the restrictions on the study 
design and analysis sample yield an analysis data set where concern about bias is 
negligible. The main conclusions are the same whether using either set of estimates 
of E[D] so I report results for E[D] computed by formula because few researchers 
are likely to compute bootstrap estimates in empirical studies. I first consider results 
when segregation is assessed using tract-level data, the most conservative choice for 
minimizing potential bias. Here the mean for E[D] was 7.36. Equally and perhaps 
more importantly, its values displayed considerable variation across cases with an 
inter-decile range of 8.86 with 10 % of cases at or below 3.74 and 10% of cases at 
or above 12.60. So the first takeaway point is that the screening criteria did not 
reduce the typical potential for bias to negligible levels. A second takeaway point is 
that screening cases did not yield an analysis data set where the potential for bias is 
uniform across cases. This is not surprisingly because relative group size is an 
important determinant of E[D] and it varies widely across cities even after screening 
out cases where percent minority is below 3 %. 

Another finding is that the level of underlying potential for bias in D varies across 
group comparisons. The mean for E[D] is 6.10 for White-Black segregation, 7.02 
for White-Latino segregation, and 10.86 for White-Asian segregation. The cross- 
group variation traces to the fact that, on average, the relative size of the minority 
population is smaller for the comparisons involving Latinos and even more so for 
comparisons involving Asians. This raises concerns that bias might distort cross- 
group comparisons on segregation. The means on D are 48.48 for the White-Black 
comparisons, 35.13 for the White-Latino comparisons, and 39.21 for the White 
Asian comparisons. It is interesting to observe that the difference of 3.84 between 
the White-Asian and White-Latino averages for E[D] is almost as large as the differ- 
ence of 4.08 between the White-Asian and White-Latino averages for D. 

The important point here is that the conventional approach to screening cases 
does not do away with nagging concerns about the potential role of bias. Furthermore, 
these results only get worse when segregation is measured using data at lower levels 
of geography such as for block-groups and blocks. For example, when calculated 
using block-level data, the means for E[D] are 21.98 for White-Black segregation, 
35.13 for White-Latino segregation, and 39.21 for White-Asian segregation. The 
results for E[D] also varied considerably across areas and across group comparisons 
as observed for E[D] computed using tract-level data. 
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14.2.3 Efficacy of Prevailing Practices: Weighting Cases 
by Minority Population Size 


Researchers often are aware of concerns that index bias can distort results even after 
applying sample restrictions aimed at excluding the most problematic cases. In 
many studies researchers address this concern by weighting cases by minority pop- 
ulation size for the city when performing statistical analyses such as computing 
summary statistics (e.g., means) for groups of cases or estimating regression equa- 
tions. Unfortunately, the efficacy of this strategy is not rigorously established. 

The practice is sometimes described as being an appropriate way to deal with 
“unreliable” cases but this rationale is open to question. Cases with biased index 
scores are not “unreliable” in the usual statistical sense of that term. To the contrary, 
biased index scores are highly reliable in the sense of yielding consistent results 
under given study conditions. The problem is not that the scores are inconsistent; 
the problem is that they are consistently high; that is, they are reliable but still 
untrustworthy because they are biased upward. 

Weighting cases by minority population size does not “correct” the higher and 
potentially misleading index scores that may result from bias for some cases. So 
what does the practice accomplish? One clear consequence is to strongly skew anal- 
ysis results in the direction of reflecting segregation patterns found in cities that 
have large minority populations. In most studies this means that a relatively small 
subset of cases will receive larger weights and have a disproportionate influence on 
results of statistical analyses. In contrast, a larger number of remaining cities will 
receive smaller weights and have modest-to-negligible influence on results. This 
amounts to reducing the “nominal” sample size for the macro units (usually cities) 
as results will be similar those obtained when excluding cases with small minority 
populations. 

Minority population size is at best only a crude proxy for bias potential (1.e., 
E[D]). Accordingly, screening and weighting on this item can introduce at least two 
kinds of distortions to results. Holding relative group size constant, many smaller 
cities will be discounted or excluded from the analysis altogether when more careful 
diagnostic analysis would show that their index scores are as trustworthy as those 
for larger cities (because bias is intrinsically related to relative group size, not to 
absolute size). The practical result is that weighting cases to protect against bias will 
tend to be “hit and miss” in effectiveness but the practice will definitely skew results 
to more closely reflect segregation patterns for cities with large minority 
populations. 

The main point is that current approach of guarding against undesirable conse- 
quences of bias based on using informal proxy criteria is open to question. Moreover, 
even if problematic cases were identified more carefully (e.g., using bootstrap meth- 
ods to estimate E[D]), an important underlying problem would remain; current 
practices do not correct flawed scores so the cases can be trusted and used in the 
analysis. Instead, the cases that are impacted by index bias are excluded or dis- 
counted and analysis results thus reflect segregation patterns observed for a small 
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subset of cases that are not adversely impacted by bias. This is hardly an ideal study 
design. These cases, while important in their own right, are not necessarily repre- 
sentative. So one is left hoping, but not knowing, that “true” segregation patterns in 
the large fraction of cases that are excluded or discounted do not differ from the 
segregation patterns in the smaller subset of cases that dominate the analysis results. 


14.2.4 An Aside on Weighting Cases by Minority 
Population Size 


Statistical theory provides a different and potentially defensible rationale for case 
weighting when performing statistical analyses of variation in segregation across 
cities and communities. It is that the dependent variable (i.e., the index score) exhib- 
its differential variability across cities. The relevant statistical issue is heteroskedas- 
ticity — a violation of the ordinary least squares (OLS) regression assumption that 
error variance is constant across cases. This issue is distinct and separate from index 
bias. Index bias is systematic with regard to the direction of its impact on index 
scores; biased cases have consistently inflated values for index scores. In contrast, 
heteroskedasticity does not involve bias; it involves greater volatility in scores 
around the model-predicted average and the volatility reflects scores that are below 
the predicted average as well as scores that are above the predicted average. When 
heteroskedasticity is present, estimates of means and regression coefficients are 
unbiased but significance tests in OLS regression may be questioned because the 
assumptions underlying the tests are not met. 

One strategy for dealing with heteroskedasticity in aggregate-level regressions is 
to perform weighted least squares (WLS) regression using case weights (w) that are 
proportional to the inverse of each case’s expected error variance (Hanushek and 
Jackson 1977). Statistical theory indicates the appropriate weight (w) would be the 
reciprocal of the expected error variance of D. This can be calculated directly.” But 
some might view absolute size of the minority population as a potentially accept- 
able proxy and defend weighting cases by population size on this count. 

This would perhaps be justified if variation in index scores was greater when 
minority population size is small. But empirical analysis suggests this is not the 
case. This is due to two reasons, one simple and one complex. I explored the issue 
by examining the empirical associations among three variables — the score for D, 
predisposition for bias measured by E[D], and minority population size — using the 
data set and measures introduced and described in the previous section. The simple 


°D is the White-Black difference of proportions (p,—Pp, ) residing in areas where proportion 
White for the area equals or exceeds that for the city as a whole. The expected variance ( o? ) of a 


difference of proportions is obtained by squaring the standard error of the difference of propor- 


tions- o(p,—p,) —given by //p,q,/n,+P.q,/n, or, alternatively, y pq(1/n, +1/n,) if using the 


pooled calculation of p = (pn, + pan, ) /(n, +n, ). 
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part of the story is that values of D do not display heteroskedasticity in relation to 
minority population size. More specifically, dispersion in the values of D around the 
mean is relatively constant across minority population size so there is no obvious 
empirical basis for weighting cases by minority population size to compensate for 
heteroskedasticity. 

The complex part of the story is that predisposition for bias (i.e., E[D]) is mod- 
erately and inversely correlated with minority population size.'° This might lead one 
to expect that dispersion in residuals would be larger when minority population size 
is small. Instead, however, the dispersion in residuals for D is lower, not higher, 
when E[D] is high. This is because index bias raises the “floor” for D since bias 
precludes low scores. This then truncates the range of variation in D in comparison 
to the range of variation in D when E[D] is low. 

Since the argument for weighting cases by minority population size to deal with 
the statistical issue of heteroskedasticity is weak, it is appropriate to ask whether the 
practice is warranted on any basis. The best one can say in defense of the practice is 
that it may tend to reduce the influence of cases that on average have higher levels 
of bias (i.e., higher values on E[D]). But this purpose could be better served by 
establishing weights based on direct assessments of bias. However, even if case 
weights were well-calibrated to reflect bias, the practice of down-weighting cases 
proportional to bias is a weakly justified ad hoc procedure. It does not “repair” or 
“correct” inflated index values for individual cases. Misleading cases remain mis- 
leading. What the practice does accomplish is to minimize the influence of poten- 
tially misleading scores when they are averaged in with other scores that are viewed 
as less misleading. 

If the rationale for case-weighting is not particularly strong, is it at least benign? 
This question is hard to answer. One thing is clear; weighting by minority popula- 
tion size skews results toward patterns of segregation observed in cities with large 
minority populations. This is definitely a non-representative subset of cities dispro- 
portionately including large cities and medium-sized cities where percent minority 
is higher. Whether this influences findings in undesirable ways or not is unclear and 
may depend on the question being addressed. If one is investigating patterns and 
variation in segregation for all cities — that is, to understand how segregation varies 
across cities based on urban-ecological factors (e.g., population size, racial compo- 
sition, population growth, etc.) equal weighting of all cases is more appropriate. 
Weighting cases by minority population shifts the focus away from outcomes for all 
cities and toward outcomes for minority individuals residing in cities with large 
minority populations. Skewing results in this way may be tolerable for some 
research questions. But it would be best for researchers who use these practices to 
acknowledge the issue and reflect on how findings might be affected. 


10 This is because absolute minority size has a moderate association with relative group size which 
is intrinsically related to index bias for D and all other indices of uneven distribution except S. 
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14.2.5 Summing Up Comments on Prevailing Practices 


In this section I have argued that the research designs of empirical studies of resi- 
dential segregation are shaped in important ways by researchers’ concerns about the 
possible undesirable consequences of index bias. Motivated by these concerns, and 
with the best of intentions, segregation researchers routinely adopt a variety of 
informal practices such as restricting analysis samples to exclude cases where they 
suspect bias may render index scores untrustworthy and differentially weighting 
remaining cases when conducting statistical analyses. The goal is to minimize the 
potentially undesirable impacts of bias on index scores for cases that are included in 
the analysis sample. 

I raised concerns that the efficacy of this patchwork of informal practices is open 
to question on various counts not the least of which being that bias is “flagged” by 
crude proxies instead of by using best available direct approaches for diagnosing the 
potential for bias. In the final analysis, I argued that the greater concern is that, even 
if these prevailing practices for dealing with index bias are refined and improved, 
they would continue to have an important but largely unappreciated undesirable 
consequence. This is that the practices narrow the scope of segregation studies in 
two important ways. First, they restrict empirical analysis to a subset of potentially 
non-representative cases and group comparisons where index scores are presumed 
to be less problematic. Second, they eliminate the possibility of investigating many 
important research questions that involve situations where standard indices are 
viewed as prone to non-negligible bias. 

Based on this I argue that the most desirable strategy all around is to deal with 
bias at the point of measurement and obtain index scores that are not distorted by 
index bias. Having unbiased index scores would make it possible to use individual 
cases “as is”. It would eliminate the need to screen and exclude cases due to con- 
cerns about bias. It would eliminate the need to use weighting procedures to mini- 
mize the influence of cases with biased scores on results of statistical analyses. The 
attractiveness of this kind of solution has not been overlooked. But past efforts to 
deal directly with index bias at the point of measurement have not gained accep- 
tance. I review the reasons for this in the next section. 


14.3 Limitations of Previous Approaches for Dealing 
Directly with Index Bias 


The potential benefits of dealing directly with the index bias at the point of measure- 
ment have not gone unrecognized and a variety of suggestions for developing unbi- 
ased versions of segregation indices have been offered over the decades. To this 
point, however, none of these suggestions has gained wide acceptance in empirical 
research. The kind of approach proposed most often is to adjust scores of standard 
versions of index scores downward to eliminate the impact of upward bias 
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associated with their expected values under a baseline model of random distribution 
(e.g., Cortese et al. 1976; Winship 1977; Farley and Johnson 1985; Carrington and 
Troske 1997; Allen et al. 2009; Mazza and Punzo 2015). For example, Winship 
(1977) and Carrington and Troske (1997) have proposed a relatively simple “norm- 
ing” adjustment that has intuitive appeal.'' They propose calculating “unbiased” or 
“bias adjusted” scores for D, designated here as D*, based on the following 
calculation. 


p*=(p-E[D))/(1-E[D) 


The justification for the calculation is that the value obtained indicates the degree to 
which observed departure from uneven distribution (D) exceeds the departure 
expected under a baseline model of random distribution (i.e., E[D]). In principle this 
adjustment can be applied to any index of uneven distribution for which the expected 
value under random distribution (E[*]) can be estimated. 

Unfortunately, conceptual and practical issues have worked against wide adop- 
tion of this procedure. Regarding conceptual issues, the interpretation of D* is more 
technical and abstract than the interpretation of the conventional version of D. For 
example, negative values are possible and, while this is a valid result under the pro- 
cedure, it is unsettling to many researchers. This negates one of the appealing 
aspects of D; namely, the ease with which its interpretation can be conveyed to 
broad audiences as well as professional audiences. Regarding practical issues, the 
method requires estimating E[D] as part of the analysis. In principle this can be 
accomplished using either analytic formulas or bootstrap simulation methods. But 
so far these options have not been embraced by segregation researchers due at least 
in part to the technical and computational burdens associated with estimating E[D]. 

Prospects for adoption of this approach in the future are poor. One reason for this 
is the formula-based methods for estimating E[D] that are most easily implemented 
can perform poorly when the full population of the city includes groups other than 
the two groups in the segregation comparison. Unfortunately, this condition is com- 
mon in many research situations. It undercuts the potential value of using simple 
formula-based approaches to estimating E[D] because estimated values tend to be 
too high and in turn can lead values of D* to be too low because the adjustment to 
remove the impact of index bias is too aggressive. Until now this problem has gone 
unnoticed in the literature. In principle, the problem can be overcome by drawing on 
refined versions of formula-based estimates of E[D] or using estimates based on 
bootstrap simulation methods, but complexity and increased computational burden 
associated with these superior approaches to estimating E[D] makes it unlikely 
researchers will adopt these options. 


1! Allen and colleagues (2009) also suggest a similar strategy. 


234 14 Index Bias and Current Practices 
14.4 Summary 


In this chapter I pointed out that empirical studies of residential segregation are 
strongly influenced by concerns about index bias. These concerns are reflected in 
the study designs researchers adopt and in the methods of statistical analyses 
researchers use. One important consequence of this is that researchers carefully 
avoid studying segregation in situations where they suspect bias will render scores 
of standard versions of indices of uneven distribution untrustworthy. Accordingly, 
they avoid studying group comparisons involving small groups; they avoid studying 
group comparisons where groups are imbalanced in size; they avoid measuring seg- 
regation using smaller spatial units such as census blocks; and they avoid examining 
segregation in smaller communities. Even after adopting these restrictions on study 
design, researchers continue to have concerns that bias makes some index scores 
untrustworthy. Analysis reviewed in the chapter shows their concern is well justi- 
fied. Motivated by these concerns researchers routinely weight cases differentially 
based on minority population size when performing statistical analyses on the 
assumption that this will minimize the impact cases with scores inflated by bias will 
have on results. In a very real sense this has the practical effect of reducing the 
sample size even further and skewing it toward a non-random subset of cases. Taken 
collectively, these several practices limit the scope of segregation studies so atten- 
tion is focused disproportionately on patterns of segregation for large metropolitan 
areas with minority populations that are large in absolute and relative terms. And 
even among this subset of cases, results of statistical analyses disproportionately 
reflect segregation patterns for cities with larger minority populations. 

The adoption of these practices is well intentioned. But the current state of affairs 
is far from ideal. As things currently stand, even after restricting study designs to 
avoid problematic cases, researchers remain less than confident about scores for the 
individual cases in their studies and routinely weight cases differentially when per- 
forming statistical analysis to minimize the impact of index bias. This concern com- 
plicates elementary tasks in segregation analysis such as being confident about the 
index score for a given case, or comparing scores for two cases, or following the 
score for a single case over time. More importantly, concern about index bias leads 
researchers away from investigating segregation in a wide range of situations that 
would be theoretically relevant and sociologically interesting if index scores could 
be trusted. 

The better alternative is to deal with problem of index bias directly at the point 
of measurement. Previous suggestions for accomplishing this task have involved 
applying after the fact adjustments to standard versions of index scores. These “bias 
adjusted” indices have never gained wide usage. In part this is because they have 
involved complex and often computationally demanding procedures. In addition 
many researchers find the resulting measures to be unfamiliar and therefore more 
difficult to interpret and explain to nontechnical audiences. Finally, researchers sim- 
ply have not yet been convinced that the approach of applying corrective adjustments 
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to standard index scores yields robust and effective results over the wide range of 
situations encountered in “real world” empirical studies. 

In the next chapter I introduce a new solution for moving beyond the current 
unsatisfactory situation. By drawing on the difference of means formulation of indi- 
ces of uneven distribution, I identify new insights about the nature of index bias that 
make it possible to address index bias at the point of measurement. The insight is 
that, when segregation is cast as a group difference on average levels of scaled 
group contact, bias can be traced to a relatively simple source; namely, how group 
contact with the reference group is impacted by self-contact which inherently dif- 
fers for the reference group and the comparison group. Eliminating self-contact 
from index calculations by assessing group contact based on “neighbors” instead of 
“area population” eliminates this inherent source of bias in index scores. Chapter 15 
reviews the basis for establishing unbiased versions of popular indices. Chapter 16 
reviews the performance of the “unbiased” versions of popular indices to establish 
that, as desired, they have expected values of zero under random assignment. It also 
makes the case that the new measures allow researchers to use familiar indices with 
greater confidence and dispense with most of the ad hoc practices that currently 
restrict the scope of segregation studies. 


References 


Allen, R., Burgess, S., & Windjeijer, F. (2009). More reliable inferences for segregation indices, 
Working Paper Series no. 09/2016. Bristol: Centre for Market and Public Organisation, Bristol 
Institute of Public Affairs, University of Bristol. 

Blau, F. (1977). Equal pay in the office. Lexington: Lexington Books. 

Boisso, D., Hayes, K., Hirschberg, J., & Silber, J. (1994). Occupational segregation in the multidi- 
mensional case: Decomposition and tests of significance. Journal of Econometrics, 61, 
161-171. 

Carrington, W. J., & Troske, K. R. (1995). Gender segregation in small firms. Journal of Human 
Resources, 30(3), 503-533. 

Carrington, W. J., & Troske, K. R. (1997). On measuring segregation in samples with small units. 
Journal of Business & Economic Statistics, 15(4), 402—409. 

Cortese, C., Frank Falk, R., & Cohen, J. (1976). Further considerations on the methodological 
analysis of segregation indices. American Sociological Review, 41, 630-637. 

Cortese, C., Frank Falk, R., & Cohen, J. (1978). Understanding the standardized index of dissimi- 
larity: Reply to massey. American Sociological Review, 43, 590-592. 

Denton, N., & Massey, D. S. (1988). Residential segregation of Blacks, Hispanics, and Asians by 
socioeconomic status and generation. Social Science Quarterly, 69, 797-817. 

Falk, R. F., Cortese, C., & Cohen, J. (1978). Using standardized indices of dissimilarity: Comment 
on Winship. Social Forces, 57, 713-716. 

Farley, R., & Frey, W. H. (1994). Changes in the segregation of whites from blacks during the 
1980s: Small steps toward a more integrated society. American Sociological Review, 59, 23-45. 

Farley, R., & Johnson, R. (1985). On the statistical significance of the index of dissimilarity. In The 
proceedings of the social statistics section of the American Statistical Association (pp. 415- 
420). Washington, DC: American Statistical Association. 

Farley, R., & Taeuber, K. E. (1968). Population trends and residential segregation since 1960. 
Science, 159, 953-956. 


236 14 Index Bias and Current Practices 


Farley, R., & Taeuber, A. F. (1974). Racial segregation in the public schools. American Journal of 
Sociology, 79, 888-905. 

Hanushek, E. A., & Jackson, J. E. (1977). Statistical Methods for Social Scientists. Orlando: 
Academic Press. 

Jahn, J., Schmid, C. F., & Schrag, C. (1947). The measurement of ecological segregation. American 
Sociological Review, 12, 293-303. 

Lichter, D. T., Parisi, D., Taquino, M. C., & Grice, S. M. (2010). Residential segregation in new 
Hispanic destinations: Cities, suburbs, and rural communities compared. Social Science 
Research, 39, 215-230. 

Massey, D. S. (1978). On the measurement of segregation as a random variable. American 
Sociological Review, 43, 587-590. 

Massey, D. S., & Denton, N. A. (1992). Residential segregation of Asian-origin groups in U.S. 
metropolitan areas. Sociology and Social Research, 76, 170-177. 

Massey, D. S., & Fischer, M. J. (1999). Does rising income bring integration? New results for 
Blacks, Hispanics, and Asians in 1990. Social Science Research, 28, 316-326. 

Mazza, A., & Punzo, A. (2015). On the upward bias of the dissimilarity index and its corrections. 
Sociological Methods and Research, 44, 80-107. 

Ransom, M. R. (2000). Sampling distributions of segregation indices. Sociological Methods and 
Research, 28, 454-475. 

Reiner, T. A. (1972). Racial segregation: A comment. Journal of Regional Science, 19, 137-148. 

Roof, W. C., & Van Valey, T. L. (1972). Residential segregation and social differentiation in ameri- 
can urban areas. Social Forces, 51, 87-91. 

Schnore, L. F., & Evenson, P. C. (1966). Segregation in southern cities. American Journal of 
Sociology, 72, 58-67. 

Sorenson, A., Taeuber, K. E., & Hollingsworth, L., Jr. (1975). Indices of racial residential segrega- 
tion for 109 cities in the United States 1940 to 1970. Sociological Focus, 8, 125-142. 

Taeuber, K. (1964). Negro residential segregation: Trends and measurement. Social Problems, 12, 
42-50. 

Taeuber, K., & Taeuber, A. (1965). Negroes in cities: Racial segregation and neighborhood 
change. Chicago: Aldine Publishing Company. 

Taeuber, K., & Taeuber, A. (1976). A practitioner’s perspective on the index of dissimilarity: 
Comment on Cortese, Falk, and Cohen. American Sociological Review, 41, 884-889. 

Valey, V., Thomas, L., & Roof, W. C. (1976). Measuring segregation in American cities: Problems 
of intercity comparison. Urban Affairs Quarterly, 11(4), 453-468. 

Winship, C. (1977). A re-evaluation of indices of residential segregation. Social Forces, 55, 
1058-1066. 

Winship, C. (1978). The desirability of using the index of dissimilarity or any adjustment of it for 
measuring segregation: Reply to Falk, Cortese, and Cohen. Social Forces, 57, 717-721. 

Zelder, R. E. (1972). Racial segregation: A reply. Journal of Regional Science, 19, 149-153. 


Open Access This chapter is licensed under the terms of the Creative Commons Attribution- 
NonCommercial 2.5 International License  (http://creativecommons.org/licenses/by-nc/2.5/), 
which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any 
medium or format, as long as you give appropriate credit to the original author(s) and the source, 
provide a link to the Creative Commons license and indicate if changes were made. 

The images or other third party material in this chapter are included in the chapter’s Creative 
Commons license, unless indicated otherwise in a credit line to the material. If material is not 
included in the chapter’s Creative Commons license and your intended use is not permitted by 
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from 
the copyright holder. 


Chapter 15 
New Options for Understanding and Dealing 
with Index Bias 


In this chapter I introduce a new approach for addressing the problem of index bias 
at the point of measurement. Specifically, I introduce new formulations of popular 
indices of uneven distribution that are free of bias and take expected values of zero 
when individuals and households are randomly assigned to residential locations. I 
accomplish this task by drawing on the difference of means formulations of segre- 
gation indices introduced in earlier chapters to first identify and then eliminate the 
root source of bias in standard versions of popular indices of uneven distribution. 
The crucial insight from the difference of means formulation is that the values for 
all popular indices of uneven distribution can be seen as resting on person-specific 
scores for pairwise group contact (p). Close consideration reveals that the source of 
index bias is found in these group contact scores. Happily, a surprisingly simple 
refinement in the calculation of these scores eliminates index bias. 

I review the root problem and its solution in more detail in the body of this chap- 
ter but offer a brief preview the essence of the problem and the solution here. To 
begin, recall that the difference of means framework establishes that all popular 
indices of uneven distribution can be formulated in terms of group differences in 
scaled residential exposure or contact. More specifically, the score for a particular 
index of uneven distribution can be obtained by calculating the difference of group 
means on individual residential outcomes (y) scored using an index-specific scaling 
function y=f (p) . The input to the scaling function, “p”, is the individual’s level of 
pairwise contact with the reference group in the comparison. The value of p is cal- 
culated from the area population counts for the two groups in the segregation com- 
parison based on p; =n; / (n; + n). This approach to calculating the value of p 
introduces inherent upward bias in group differences on scores for p and also group 
differences on scores of y. 

The source of bias is simple; the count terms (i.e., nj; and nv;) used in the calcula- 
tion of group contact (p;) include the individual in question. The score for contact 
thus combines two components of contact — contact with self and contact with 
neighbors. For any individual the component of contact that derives from contact 
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with neighbors can vary widely; it can range from no (0%) contact with the refer- 
ence group to only (100 %) contact with the reference group. In principle, this com- 
ponent of contact can be random for any individual regardless of group membership. 
Thus, under random assignment the expected value of this component of contact 
will be the same for every individual regardless of group membership and expected 
group differences will be zero (0). In contrast, the component of contact that derives 
from self-contact cannot be randomly assigned; it is fixed and invariant for each 
individual. Contact with self distorts group comparisons on contact because this 
component of contact inherently differs by race. Specifically, self-contact makes the 
assessed value of contact (p) intrinsically higher for members of the reference group 
and intrinsically lower for members of the comparison group. This is the source of 
bias in indices of uneven distribution. 

This can be understood intuitively by considering the situation where residential 
assignments are random. The expected representation of the reference group among 
neighbors will obviously be same for all individuals and for both groups. But when 
self-contact is added in, the distribution of values on p necessarily shifts up for 
members of the reference group and necessarily shifts down for members of the 
comparison group. Index scores are computed from the difference of groups means 
on scaled contact (y) scored from simple pairwise contact (p). Since all of the index- 
specific scaling functions (i.e., y=f (p) ) score y as a positive, monotonic function 
of p, the expected distribution of y will necessarily be higher for the reference group 
than for the comparison group. As a result, standard versions of indices of uneven 
distribution are biased upward; that is, their expected values under random assign- 
ment (E[¢]) are positive. 

I eliminate index bias in indices of uneven distribution by making a simple 
refinement to the contact calculation for individuals; J assess contact using counts 
for neighbors instead of area population. For purposes of discussion, I designate the 
revised version of contact as p’. This modification removes the fixed contribution of 
self-contact from the calculation of group contact scores for individuals. Intuitively, 
the expected representation of the reference group among neighbors is the same for 
all individuals under random assignment regardless of group membership. As a 
result, the expected distribution of values on contact with neighbors (p’) is the same 
for both groups. It follows necessarily that the same is true for the expected distribu- 
tion of scaled contact (y’) scored from p’. Accordingly, the expected value of the 
group difference of means on scaled contact (y’) also is zero under random assign- 
ment. Thus, indices of uneven distribution calculated in this way are unbiased. 
Below I develop this conclusion more carefully. In Chap. 16 I report results of 
empirical analyses demonstrating that indices of uneven distribution computed 
using this relatively simple refinement take an expected value of zero under random 
assignment. 
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I should give credit where credit is due and note that a study by Laurie and Jaggi 
(2003) set me on the path to discovering a general strategy for developing unbiased 
versions of all popular indices of uneven distribution. Laurie and Jaggi used a 
Schelling-style agent simulation model to produce model-generated residential pat- 
terns in a virtual city.! As is common in agent models they assessed segregation at 
very small spatial scales. For purposes of the discussion here I consider the example 
of a city with simple housing grid that is divided into small “blocks” based on 3 x3 
square sections that contain 9 households.” Ordinarily, segregation assessed at this 
fine-grained spatial resolution would be subject to extremely high levels of index 
bias. For example, in a city with an 80/20 White-Black group ratio the value of E[D] 
would be 37.9 and the value of E[S] would be 11.1. Laurie and Jaggi (2003) mea- 
sured segregation using an index of their own construction which they termed the 
“ensemble averaged, von Neumann segregation coefficient.’ They designated their 
measure as “S” but I term it “LJ” here to credit them and also to avoid confusion 
with using S to designate the separation index. Lauri and Jaggi claimed their index 
had an expected value of zero under random distribution; that is E [LJ ] = 0. Initially 
I was skeptical of the claim. But I examined the behavior of their index in detail and 
discovered the claim was valid; Laurie and Jaggi’s LJ index was indeed “unbiased.” 
That is, over repeated trials of randomly generated residential distributions the dis- 
tribution of values for scores on the LJ index will have a mean of zero. 

Intrigued by this property and its potential benefits for measuring segregation in 
agent-models, I examined the formula for their index more closely to see how it 
related to more well-known indices of uneven distribution (Fossett 2007). I found 
the formula yielded the average over all individuals of a “scaled” score on same- 
group contact. For each individual the scaled score is obtained by first taking the 
difference between the observed proportion same-group among the individual’s 
neighbors from the expected proportion based on the group’s representation in the 
population and then expressing this result as a proportion of the maximum possible 
deviation under complete segregation. Putting this in notation more familiar to 
demographers and sociologists, scores for White households (agents) were given by 
(p; — P)/ (1 a P) where P is proportion White in the population of agents and p; is 


'Laurie and Jaggi (2003) is one of many recent studies using Schelling-style agent simulation 
models — computer-implemented elaborations of the influential agent model of segregation dynam- 
ics first introduced in Schelling (1971). 

Laurie and Jaggi actually used a smaller, spatially delimited “von Neumann” or “rook’s” neigh- 
borhood which consists of the 4 neighboring households who share sides with a focal household in 
a housing grid. I use the 3x3 “bounded” neighborhood to correspond better with practices in 
sociological segregation studies. All findings I note in this discussion also apply to spatially delim- 
ited neighborhoods of any spatial scale. But I defer detailed discussion of this topic for another 
time. 
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proportion White for the individual’s neighbors.’ Similarly, scores for Black house- 
holds (agents) were given by (q; 7 Q)/ (1 — Q) where Q is proportion Black in the 
population of agents and q; is proportion Black for the individual’s neighbors. The 
sum of these scores is then divided by T, the total number of households (agents), to 
obtain the overall average. The resulting expression (dropping subscripts for conve- 
nience of presentation) is 


LJ = (1/T)-[=(p-P)/(1-P)+2(q-Q)/(1-Q)]. 


Interestingly, I found the separate averages for Whites and Blacks calculated as 
shown below also gave the same result. That is, 


LJ = (1/W)-(p-P)/(1-P) = (1/B)-X(q—Q)/(1-Q). 
These expressions can be restated as follows 
LJ = (Zp/W-P)/(1-P) = (Zq/B-Q)/(1-Q). 


This expression reveals a close correspondence between LJ and Bell’s (1954) 
revised index of isolation (Ip). Bell’s Ip expresses a group’s average for same-group 
contact as a proportion of its possible logical range. For Whites and Blacks, respec- 
tively, Ip would be given as 


Ik = (Paw —P)/(1-P),and 
k= (Pss -Q)/(1-Q) 


where: Pyw =(1/W)-£(w;-p;); Pes =(1/B)-2(b, -q,); P=(W/T); Q=(B/T); 
W, B, and T are the city totals for the White, Black, and Total populations, respec- 
tively; w;, b;, and t;, are the counts for White, Black and Total population in area i; 
and p; and q; are area proportion White and Black, respectively, based on w;/t; and 
b/t;. 

The contact expressions Pww and Pgg can be restated as =(w, “p,)/ W and 
(b, -qi ) /B, respectively. If the calculations are expressed from the point of view 
of individuals, as in Lauri and Jaggi, they can be given as Xp/W and Xq/B. Thus, Ir 
for Whites and Blacks will take the same form given above for LJ. Thus, 


Ik =(Zp/W-P)/(1-P), and 


3To clarify terms in this discussion, city-level terms are given as follows: W and B are totals for 
Whites and Blacks, respectively, T= W +B, P= W/T, and Q= B/T . For each individual, 
w and b are the number of White and Black neighbors in the relevant neighborhood, t= w+b, 
and p and q are proportion White and Black, respectively, based on p= w/t and q=b/t. 
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I, =(Zq/B-Q)/(1-Q) 


As shown here the two measures — LJ and Ip — appear to be equivalent, but there 
is an important difference between them that causes Ip and LJ to exhibit fundamen- 
tally different behavior. The difference in behavior is that Bell’s I, will manifest 
positive bias (i.e., E[I, |>0) while Laurie and Jaggi’s LJ will be unbiased (i.e., 
E[LJ |= 0 ). The difference in behavior traces to one crucial difference between the 
calculations for the two indices. It is the difference in how the values of p and q are 
calculated for LJ and Ig. For Bell’s Ip the calculation of contact terms follows the 
standard methodological practice in sociological segregation studies; the contact 
terms p and q are calculated using count terms for the full area population. 
Significantly, this calculation includes the focal household in the count terms that 
appear in the numerator and the denominator of the contact calculations. In contrast, 
for Laurie and Jaggi’s LJ the calculation of contact terms p and q is based on a dif- 
ferent procedure; it uses count terms for the focal household’s neighbors. Thus, the 
approach Laurie and Jaggi use excludes the focal household from the count terms 
used in the calculations. To clarify, the contact scores used in calculating Ip and LJ 
differ as follows. 


For Ik, p=w/t and q=b/t. 
For LJ, p =(w-1)/(t-1) and q =(b-1)/(t-1). 


I use the prime symbol to differentiate contact based on neighbors from contact 
based on area population. 

Closely comparing the design and behavior of the two measures led me to draw 
several conclusions. One is that, when focusing on a two group comparison, the LJ 
index can be described as an unbiased version of Ip. Another is that the only differ- 
ence between the standard (biased) and unbiased versions of Ip is how contact is 
calculated. Specifically, self-contact is eliminated in the unbiased LJ version and 
this is accomplished by the simple exercise of excluding the focal household from 
the count terms that appear in the numerator and denominator of the contact calcula- 
tions. This revealed that bias in Ip traces to a single source — the impact of incorpo- 
rating self-contact into the calculation of group contact scores for individuals. It 
also revealed that bias could be eliminated by following Laurie and Jaggi’s example 
and making the simple adjustment of computing group contact for individuals based 
on count terms for neighbors instead of count terms for area population. When this 
adjustment is implemented, values of Ip take an average value of zero when calcu- 
lated over repeated trials for random residential distributions. 
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Based on these intriguing findings, I focused on the question of whether this mea- 
surement strategy could be adapted in a general way for application with measures 
of uneven distribution. I focused first on the separation index (S) as a natural first 
choice because it is equivalent to Bell’s revised index of isolation (Ig) in the special 
case where the city population consists of only two groups (James and Taeuber 
1985; Stearns and Logan 1986; White 1986).* In light of this it is straightforward to 
describe Laurie and Jaggi’s LJ index as an unbiased version of the separation index 
(S). Thus, Laurie and Jaggi deserve credit for establishing the core strategy for 
developing an unbiased version of S. 

Initially I was frustrated in applying this insight to other indices of uneven distri- 
bution. The crucial insight of the strategy is to eliminate bias by eliminating the 
impact of self-contact from group contact calculations. But the best known comput- 
ing formulas for indices of uneven distribution do not provide an obvious opportu- 
nity for acting on this insight because they do not yield index scores as group 
differences in average contact outcomes for individuals. As one example, James and 
Taeuber (1985: 6) give the following widely used computing formula for calculating 
the value of separation index 


2 


S = 1/NPQ-3t,(p,-P) . 


This formula is efficient for computing values of S. But it does not give the value of 
S as a group difference in average contact scores for individuals. Moreover, I found 
that implementing the p; adjustment used by Laurie and Jaggi in this formula did not 
yield an unbiased version of S with the desirable properties of the version estab- 
lished by Laurie and Jaggi. 

I then struck on a second key insight. It is that eliminating bias from index scores 
first requires that the index be formulated as a difference of means on residential 
outcomes scored from pairwise contact. This isolates the impact of group differ- 
ences in self-contact separately by group so its role can be eliminated. This prompted 
me to search for a formulation of the separation index that (a) would highlight the 
role of average group contact outcomes for individuals and (b) could be used as a 
template for deriving similar formulations for other popular indices of uneven 
distribution. 

Appendices outline a derivation I that achieved this goal by expressing the sepa- 
ration index (S) as a group difference of means on contact with the reference group 
in the comparison.’ I review a generic formulation in the additional material but give 
the result here using the example of White-Black segregation with Whites being 


“That is, one can describe the separation index (S) as a special case of Bell’s Revised Index of 
Isolation (In) computed using only pairwise population counts. 


Later I found a similar derivation had been reported much earlier in a little known methodological 
paper by Becker et al. (1978). 
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designated as the reference group. Thus, S is the White-Black difference in average 
contact with Whites based on 


S = Pww — Paw 


where Py = (1/ W)-=(w, -p;) , and P,y = (1/B)-(b, -p;) , with “W” and “B” 
designating total population for the reference group (Whites) and the comparison 
group (Blacks), respectively, w; and b; indicating area counts for the two groups, and 
p;=w;/ (w; +b,) indicating pairwise contact with the reference group for indi- 
viduals residing in area “i”. 

Refining the contact calculations to eliminate the role of self-contact, leads to the 


unbiased version of S given as 
S = Pww —Paw 


where P'ww and P'gw are contact expressions based on counts for neighbors instead 
of area population. They are obtained as follows. Py = (1/ W)-=(w, -p,) , and 


Pay = (1/B)-=(b, -p,) , with p; being calculated from (w, -1)/(w, +b; -1) for 
Whites and from (w, -0)/(w, +b; - 1) for Blacks. 


15.3 A More Detailed Exposition of Bias in the Separation 
Index 


I now review the issue of index bias for the separation index (S) in more detail. I 
continue with the example of White-Black segregation and for simplicity consider 
a situation where the city in question is not small, consists of only Whites and 
Blacks, and is divided into areas of constant size in terms of area population (t).° I 
start with the question of “What can be expected when households are distributed 
randomly across housing units in all areas of the city?” For any household, White or 
Black, the expected contact with Whites is assessed using counts for neighbors. 
Normally I designate this as p’ but for the current discussion I also sometimes des- 
ignate it as py using the subscript “N” to indicate “computed for neighbors.” The 
expected value of this calculation is essentially equal to proportion White in the city 
(i.e, E[ Piw |= E| Pay |=P=W/(W+B)).” 

Intuitively, this is easy to understand. When a household’s neighbors are obtained 
by a random draw from a large city population, the expected proportion Whites for 
the neighbors will be the city proportion White ( E[p ie P ). Note that the expected 


°The assumption that the city is not small assures that an individual household has a negligible 
impact on the city-wide group proportion for the reference group (P). 

7For ease of presentation, I ignore the impact of the focal household’s contribution to P for the city 
as a whole. In most empirical applications, the impact is negligible. 
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result is essentially the same for all households whether White or Black. More 
exactly, there is a very slight difference in expected value associated with the con- 
tribution the focal household makes to the combined total of Whites and Blacks and 
how this varies with the race of the focal household. In a very small city the P and 
Q results might differ slightly by race if one calculated P’ as (W — 1)/ (w +B- 1) 
for Whites and (w — 0) / (w +B- 1) for Blacks. In larger cities this potential differ- 
ence becomes negligible and I ignore it here for convenience of exposition. 

The results for expected contact in local areas can be quite different when contact 
with Whites (p) is assessed using counts for area population instead of counts for 
neighbors. Expected contact with Whites (E[p]) will now reflect the weighted aver- 
age of two contributions. The first contribution is the household’s contact with 
White neighbors (px). As noted in the previous paragraph, this reflects a random 
draw of Whites and Blacks and its expected value is equal to P for both Whites and 
Blacks. The second contribution is the household’s self-contact with Whites (ps). 
The value of self-contact will be 1 for White households and 0 for Black households 
so the contribution of self-contact to contact with Whites in the area population (p) 
varies systematically by race. The relative contribution of the two components of 
contact depends on the value of area (pairwise) population size (t). Contact with 
Whites based on area population can be given by 


p = Py:(t-1)/t+p,-(1/t)- 


where t is area population and t— 1 is the number of neighbors a household has. 

Under random distribution, the expected value of the term py ‘(t- 1)/ t is the 
same for every household in the city. But the term p, ‘(1 / t) is systematically dif- 
ferent for Whites and Blacks. Specifically, ps is 0/t for Blacks and 1/t for Whites. 
This causes the expected value of the White-Black difference in mean contact with 
Whites to differ by 1/t. 

To further clarify, I examine expected contact separately by race. A White house- 
hold’s expected number of White neighbors under random assignment is given by 
the household’s number of neighbors (t—1) multiplied by expected contact with 
Whites for neighbors (py) which as noted above is E| Paw =P=W/ (W+B). 
Unsurprisingly, the White household’s expected self-contact with Whites in the area 
population (ps) is 1. As a result the expectation for White contact with Whites in the 
standard contact formulation based on area population (i.e., E[Pww]) can be given as 
follows. 


E [Paw ]=E[ Paw |-((t-1)/t)+1.0-(1/t) 


A Black household’s expected number of White neighbors under random assign- 
ment is the same as that expected for a White household. It is given by the house- 
hold’s number of neighbors (t— 1) multiplied by expected contact with Whites for 
neighbors (py) which as noted above is E EE ] =P=W/ (w + B) . Unsurprisingly, 
the Black household’s expected self-contact with Whites in the area population (ps) 


15.3 A More Detailed Exposition of Bias in the Separation Index 245 


is 0. As a result the expected value for Black contact with Whites in the standard 
contact formulation based on area population (i.e., E[Pgw]) can be given as 
follows. 


E[Paw ]=E| Paw |-((t-1)/t)+0.0-(1/t) 
Because BE eee | =P, it is now becomes clear that upward bias in the 
separation index (S) traces solely to role of self-contact in the group contact calcu- 
lations for the standard formula for the index. 
In the difference of means formulation S= Pyw —P,,, (given here in pairwise P* 


contact notation) and the expected value of S is given by the expected value of its 
components. That is, E[S]= E[P |- E[Paw | . This can be evaluated as follows. 


WW 


E[S] = | ((t-1)/t)-E[ Py J+(1/t)-1.0 |-[((t-1)/t)-E[ Pay ]+(17t)-0.0] 
E[s] = [((t-1)/t)-B[ Pow ]=[((t-1) /)-E[ Pow ]+[(1/0)-1]-(/0)-0] 
E[S] = [((t-1)/t)-Py -((t-1)/t)-Py ]+[(1/t)-1]-(1/t)-0] 
E[S] = (1/t)-1-(1/t)-0 
E [S] =1/t 
Note that this result is identical to the expected value for S previously established 
and reported by Winship (1977: 1064). 
Now consider the expected value for the separation index when contact for indi- 


viduals is assessed using counts for neighbors instead of counts for area 
population.’ 


E[S ]=E[ Piw ]-E[ Paw | 
E[S |=P-P 


E[s ]=0 


8 Again, this assumes city size is sufficiently large that an individual household’s contribution to P 
is negligible. 
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This establishes that an unbiased version of the separation index (i.e., S’ with 
E[S ]=0 ) can be obtained by eliminating the role of self-contact when assessing 
each individual ’s contact with the reference group. 


15.4 Situating This Result and Its Implications 
in the Difference of Means Framework 


I now recast the results for S just presented in the notation of the more general dif- 
ference of means framework. In that framework the standard formula for S is 


S = 100-(¥ -7,)= (1/ W)-=(w, -y,)-(1/B)-2(b, -y,). 


When computing S by this formula, values of y; are set according to the index- 
specific scaling function y=f (p) . In the case of S, the scaling function is the iden- 
tity function and thus y, =p,. Accordingly, the contact formula for S White-Black 
segregation given in the preceding section 


S = Pyy -Paw = (1/W)-Z(w,-p,)—-(1/B)-2(b, -p,) 


can be converted into the difference of means formula for S by simply substituting 
yi for pi. 

I introduce the unbiased version of the separation index (S') first for two reasons. 
One is that, as mentioned earlier, it was the first index for which I was able to estab- 
lish an unbiased version. The second is that the nature of bias for S is especially 
straightforward and easy to explain. But S is not a special case among indices of 
uneven distribution. The core strategy of revising the formula to remove the contri- 
bution of self-contact can be applied to any index of uneven distribution that can be 
placed in the difference of means framework. 

In standard index calculations group contact is assessed using area population 
counts and thus reflects the weighted average of two components. The first compo- 
nent registers contact with neighbors. This expected value of this component of 
contact is the same for all individuals and groups in the comparison and so does not 
contribute to index bias. The second component registers self-contact which is fixed 
for every individual and differs systematically by group. This introduces bias by 
systematically inflating contact scores for members of the reference group and 
reducing contact scores for members of the comparison group. Eliminating the sec- 
ond component from contact calculations yields unbiased group means on contact 
scores and this results in an unbiased index score. 

To summarize, the following two important conclusions apply to all popular 
indices of uneven distribution — including G, D, A, R, and H — that can be place in 
the difference of means framework. 


15.5 Reviewing a Simple Example in Detail 247 


e bias in standard index formulations traces to calculating group contact (p;) for 
households based on area population counts, and 

e unbiased versions of the index can be obtained by calculating group contact 
based on counts for neighbors. 


I now briefly review how these conclusions generalize and apply to other popular 
and widely used indices of uneven distribution. 


15.4.1 Expected Distributions of p' and y' Under Random 
Assignment 


When households are randomly assigned to areas, the expected distribution of raw 
contact scores calculated using counts for neighbors (hereafter designated p;) will 
be the same for both Whites and Blacks. As a result, expected values for group 
means on scaled exposure (y;) scored based on any index-specific scaling of “raw” 
contact among neighbors (p;) will be the same for both Whites and Blacks (i.e., 
E| Y, |= EY) 

This can be established as follows. The expected distribution of values for raw 
contact with the reference group (p;) calculated using counts for neighbors will be 
given by the binomial probability distribution for a given number of neighbors. This 
expected distribution will be the same regardless of whether the focal household for 
this set of neighbors is White or Black. Thus, the expected distribution of p; will be 
the same for Whites and Blacks. Values of contact with the reference group (p;) 
determine residential outcome scores (y;). So the expected distribution of contact 
scores (p;) directly determines the expected distribution of residential outcomes 
scores (y;). This also will be the same for Whites and Blacks. The expected distribu- 
tion of residential outcomes (y;) determines the expected mean on scaled contact 
(Y’) and this also will be the same for Whites and Blacks. Because the expected 
means on scaled contact are the same for Whites and Blacks (i.e., Y, = Y), the 
expected group difference of means (i.e., Y = Y; ), difference under random 
assignment is zero. This leads to the following general conclusion. 


Scores for indices computed as a difference of means in scaled contact with the reference 
group calculated for neighbors (instead of area population) will be unbiased. That is, the 
expected value of index scores under random assignment will be zero (0.0). 


15.5 Reviewing a Simple Example in Detail 


It is instructive to review a simple example in some detail to show how expected 
group means on residential outcomes (y) differ depending on whether an individu- 
al’s contact with the reference group (p) is assessed using counts for neighbors or 
counts for area population. For purposes of illustration I consider the example of a 
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Table 15.1 Calculations to obtain values of D and S for White-Black segregation from differences 
of group means on residential outcomes (y) based on contact with Whites for area population and 
among neighbors under random distribution 


Count Whites Blacks Share Share Whites Blacks Whites Blacks 


of p p of of Yp Yp Ys Ys 
Whites (x100) (x100) Whites Blacks (x100) (x100) (x100) (x100) 


Among neighbors 


0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
1-11 

12 60.00 60.00 0.04 0.04 0.00 0.00 60.00 60.00 
13 65.00 65.00 0.20 0.20 0.00 0.00 65.00 65.00 
14 70.00 70.00 0.89 0.89 0.00 0.00 70.00 70.00 
15 75.00 75.00 3.19 3.19 0.00 0.00 75.00 75.00 
16 80.00 80.00 8.98 8.98 0.00 0.00 80.00 80.00 
17 85.00 85.00 19.01 19.01 0.00 0.00 85.00 85.00 
18 90.00 90.00 28.52 28.52 100.00 100.00 90.00 90.00 
19 95.00 95.00 27.02 27.02 100.00 100.00 95.00 95.00 
20 100.00 100.00 12.16 12.16 100.00 100.00 100.00 100.00 
Sum 100.00 100.00 67.69 67.69 90.00 90.00 
or 

mean 


For area population 


0 N/A 0.00 0.00 0.00 N/A 0.00 N/A 0.00 
1-11 - - 

12 57.14 57.14 0.01 0.04 0.00 0.00 57.14 57.14 
13 61.90 61.90 0.04 0.20 0.00 0.00 61.90 61.90 
14 66.67 66.67 0.20 0.89 0.00 0.00 66.67 66.67 
15 71.43 71.43 0.89 3.19 0.00 0.00 71.43 71.43 
16 76.19 76.19 3.19 8.98 0.00 0.00 76.19 76.19 
17 80.95 80.95 8.98 19.01 0.00 0.00 80.95 80.95 
18 85.71 85.71 19.01 28.52 0.00 0.00 85.71 85.71 
19 90.48 90.48 28.52 27.02 100.00 100.00 90.48 90.48 
20 95.24 95.24 27.02 12.16 100.00 100.00 95.24 95.24 
21 100.00 N/A 12.16 N/A 100.00 N/A 100.00 N/A 
Sum 100.00 100.00 67.69 39.17 90.48 85.71 
or 

mean 


oe 


Notes: “N/A” indicates the combination does not occur. indicates outcomes are omitted 


because their frequency is negligible 


hypothetical city where the population consists of only Whites and Blacks, propor- 
tion White for the city (P) is equal to 0.90, and area size (t;) is equal to 21 house- 
holds.’ Table 15.1 presents the expected distributions for contact scores (p) and 
index-specific residential outcomes scores (y) for the dissimilarity index (D) and the 


°The number of households is substantially higher than would be found in typical census blocks 
but substantially lower than would be found in typical census block groups. 
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Table 15.2 Calculations to obtain values of R and H for White-Black segregation from differences 
of group means on residential outcomes based on contact with Whites for area population and 


among neighbors under random distribution 


Count Whites Blacks Share Share Whites Blacks Whites Blacks 
of p p of of Yr yr Yu Yu 
Whites (x100) (x100) Whites Blacks (x100) (x100) (x100) (x100) 
Among neighbors 

0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
1-11 - 
12 60.00 60.00 0.04 0.04 28.99 28.99 42.11 42.11 
13 65.00 65.00 0.20 0.20 31.24 31.24 45.70 45.70 
14 70.00 70.00 0.89 0.89 33.74 33.74 49.56 49.56 
15 75.00 75.00 3.19 3.19 36.60 36.60 53.79 53.79 
16 80.00 80.00 8.98 8.98 40.00 40.00 58.54 58.54 
17 85.00 85.00 19.01 19.01 44.24 44.24 64.06 64.06 
18 90.00 90.00 28.52 28.52 50.00 50.00 70.83 70.83 
19 95.00 95.00 27.02 27.02 59.23 59.23 80.08 80.08 
20 100.00 100.00 12.16 12.16 100.00 100.00 100.00 100.00 
Sum 100.00 100.00 55.96 55.96 73.69 73.69 
or 

mean 

For area population 

0 N/A 0.00 0.00 0.00 N/A 0.00 N/A 0.00 
1-11 - — = _ 
12 57.14 57.14 0.01 0.04 27.79 27.79 40.15 40.15 
13 61.90 61.90 0.04 0.20 29.82 29.82 43.45 43.45 
14 66.67 66.67 0.20 0.89 32.04 32.04 46.95 46.95 
15 71.43 71.43 0.89 3.19 34.51 34.51 50.73 50.73 
16 76.19 76.19 3.19 8.98 37.35. 3735 54.87 54.87 
17 80.95 80.95 8.98 19.01 40.73 40.73 59.52 59.52 
18 85.71 85.71 19.01 28.52 44.95 44.95 64.93 64.93 
19 90.48 90.48 28.52 27.02 50.68 50.68 71.57 71.57 
20 95.24 95.24 27.02 12.16 59.85 59.85 80.63 80.63 
21 100.00 N/A 12.16 N/A 100.00 N/A 100.00 N/A 
Sum 100.00 100.00 56.56 46.34 74.35 66.04 
or 

mean 


Notes: “N/A” indicates the combination does not occur. 


because their frequency is negligible 


oe 


indicates outcomes are omitted 


separation index (S) under random residential distributions based on a binomial 
probability model. Table 15.2 presents similar results for the Hutchens square root 
index (R) and the Theil entropy-based index (H). The first panel in each table gives 
the results when households’ contact with Whites is assessed using counts for neigh- 
bors. The second panel in each table gives the parallel results when households’ 
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contact with Whites is assessed using counts for area population in the standard 
way. 

I first review the results in Table 15.1. The first column in the first panel of the 
table lists the possible counts for Whites among neighbors. The areas in the example 
have 21 total households so every household has exactly 20 neighbors, a situation 
that would be common when measuring segregation using block-level data. Except 
for the outcome of 0, which warrants separate comment, the outcomes for counts of 
White neighbors below 12 are omitted from the listing because their occurrence 
under random distribution is quantitatively negligible. The values of proportion 
White among neighbors (p’) is given separately for Whites and Blacks in the next 
two columns. Note that proportion White among neighbors is the same for both 
Whites and Blacks under all possible combinations. The share — that is, the propor- 
tion — of households in the group expected to experience each of the possible levels 
of contact under random distribution is given separately for Whites and Blacks in 
the next two columns. Note that group shares at every outcome are the same for both 
White and Black households. Scores of residential outcomes y' scored from p’ using 
in computing the dissimilarity index (D) under the difference of means calculation 
approach are reported separately for Whites and Blacks in the next two columns. 
Scores of residential outcomes (y’) relevant for computing the separation index (S) 
are reported separately for Whites and Blacks in the last two columns. The results 
for the expected group means on index-specific residential outcomes are given in 
the bottom row of the panel. These are obtained by summing the products of group 
shares and residential outcomes scores (y’). 

Table 15.2 continues the exercise and has the same structure as Table 15.1. The 
only difference is that it provides information on the residential outcomes (y’) that are 
used in computing the Hutchens square root index (R) and the Theil entropy index (H). 

The results for the analysis in the first panels in Tables 15.1 and 15.2 are easy to 
summarize. For all four indices — D, S, R, and H, Whites and Blacks both experi- 
ence all possible outcomes on p’ and both groups identical expected distributions 
across possible outcomes on number of White neighbors. Accordingly, they have 
identical expected values for the means on the unbiased version of the residential 
outcome scores (y’) that determine each segregation index score. Consequently, the 
expected values of D’, S', R’, and H’ all are zero (0.0). For example, proportion 
White among neighbors equals the city-wide proportion (0.90) when the count of 
White neighbors is 18, 19, or 20. Residential outcomes (y’) relevant for calculating 
D are scored 1.0 in these cases and 0.0 in all other cases. Column 4 shows that 
67.69 % of Whites experience this residential outcome. Column 5 shows that the 
same is true for Blacks. Accordingly, the expected mean for the 0-1 scoring of y 
scored for D is 0.6769 for both Whites and Blacks (values shown in the final row of 
columns 6 and 7). This result shows that Whites and Blacks are equally likely to 
reside in areas where their contact with White neighbors equals or exceeds the pro- 
portion “(ef in the city as a whole. As a result, the expected value of D' is 0.0 (i.e., 
E[D ]=(E[Y4]-E SEA) = (0.6769— 0.6769) ). 

The group means [DeD in columns 8 and 9 show that Whites and Blacks 
also experience identical average levels of contact with Whites neighbors; spe- 
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cifically, on average 90.0 % of their neighbors are White, a level of contact match- 
ing the representation of Whites in the city population overall. So the expected 
value of S' also is 0.0 Gie., D[S ]=(E[ Y% ]-ELY, ])=(0.9000- 0.9000) ). 
Similar results are seen when residential outcomes are scored as relevant 
for computing the Hutchens square root index (R) (i.e. 
E[R ]=0.0=(E[Y, ]-E[Y, ])=(0.5596-0.5596)) and the Theil entropy 


index (H’) (i.e., E[H ]=0.0= (E [y. ]-E[Y; ]) = (0.7369 — 0.7369) ). 

These results are easy to summarize. When neighbors are a random draw, Whites 
and Blacks have identical probability distributions for experiencing different levels 
of unbiased contact with White neighbors (p’). It then follows that Whites and 
Blacks also have identical group means on residential outcomes (y') scored from 
unbiased contact with White neighbors (p’). 

I now review the results in the second panel of Tables 15.1 and 15.2 where con- 
tact with Whites is computed in the standard way based on counts for area popula- 
tion. The results here play out much differently. The key change producing the 
differences is that counts in the numerator and denominator of the calculation of 
proportion White (p) now include the focal household. Accordingly, the value for a 
household’s contact with Whites (p) based on area population reflect a weighted 
average of the household’s contact with Whites for neighbors (p’) and the house- 
hold’s self-contact with Whites designated here by ps which is 1= (1 / 1) for White 
households and 0 = (0 / 1) for Black households. The relevant expression is 


p = p -(20/21)+p,-(1/21) 


The distribution of values for contact with Whites among neighbors (p’) remains the 
same as before. This means that all changes in contact with Whites in the lower 
panel trace to the impact of self-contact with Whites (ps) which is systematically 
different for Whites and Blacks. 

To see the implications it is useful to consider how the results change for a house- 
hold with 18 White neighbors, the case that in this example has important implica- 
tions for the expected value of the dissimilarity index. For both White and Black 
households who have 18 White neighbors the value of contact with Whites among 
neighbors (p’) is 0.90 and results in a value of y =1 when residential outcomes (y’) 
are scores as relevant for the dissimilarity index (D). The results change when con- 
tact with Whites is based on area population (p). For a White household the value of 
contact with Whites based on area population (p) is given by 


p=p -(20/21)+p,-(1/21) 
= -(18/20)-(20/21)+(1/1)-(1/21): 
= (0.90 - 0.9524) + 0.0476 
= 0.8571+ 0.0476 
= 0.9048. 
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For a Black household the value of p is given by 


p=p -(20/21)+p,-(1/21) 
= (18/20)-(20/21)+(0/1)-(1/21): 
= (0.9524-0.90)+0.0 
= 0.8571+ 0.0 
= 0.8571, 


The White and Black households have identical contact with Whites among neigh- 
bors and accordingly in the upper panel are scored identically on the residential 
outcome (y =1) relevant for computing D'. But in the lower panel the residential 
outcome (y) relevant for computing D is scored 1 for the White household — based 
on 0.90482 0.90 — and 0 for the Black household — based on 0.8571<0.90. 

The expected proportion of households that have 18 White neighbors is 0.2852 
for both Whites and Blacks. The difference in how these households are scored on 
scaled contact with Whites in the upper and lower panels contributes to determining 
the level of bias in D. Whites are scored the same in both the upper and lower pan- 
els; y =y=1. But Blacks are scored differently in the upper and lower panels; 
y =1 in the upper panel and y=0 in the lower panel. This difference reduces the 
expected Black mean on scaled contact with Whites from E| Y; | = 0.6769 based 
on neighbors in the upper panel to E[Y,| = 0.3917 based on area population in the 
lower panel. In contrast, i o White mean on scaled contact with Whites is 
the same — E[ Y, wk =0.6769 — under both calculations. Thus, the 
expected value of D ales es from 0.0 when contact with Whites (p’) is 
based on neighbors me D |= E(Y,]- E(Y,]= = 0.6769 —0.6769=0.0) to 
0.2852 when contact iE Whites (p) is based on area population 
(E[D] = E(Y, ]— E(Y, ] = 0.6769- 0.3917 = 0.2852). 

Scaling to 100 in keeping with convention, the “bias” in the standard 
version of the index of dissimilarity (D) under random distribution is 
28.52. The parallel calculations for the separation index (S) 
(E[S] = E(Y, ]— E(Y, ] = 0.9048 — 0.8571 = 0.0477 ) indicate that bias in the stan- 
dard version is 4.77. The interested reader can confirm that these values for E[D] 
and E[S] are equal to values of E[D] and E[S] obtained using analytic formulas 
given in Winship (1977). 

This example reveals in detail how bias enters into the picture and distorts scores 
for standard versions of indices of uneven distribution. The example also documents 
how the simple refinement of assessing group contact based on neighbors instead of 
area population eliminates index bias for all indices of uneven distribution that can 
be placed in the differences of means formulation. The basis for this welcome result 
is easy to summarize. When self-contact is eliminated from that calculation, the two 
groups in the comparison will have identical expected distributions for the number 
of neighbors from the reference group and the number of neighbors from the com- 
parison group. It then follows that expected group means on residential outcomes 
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(y’) scored of the distribution of unbiased contact values (p’) will be identical for 
both groups. 


15.5.1 Additional Reflections on Results Presented 
in Tables 15.1 and 15.2 


The analysis presented in Tables 15.1 and 15.2 clarifies how index bias originates in 
the role of self-contact. The results provide an intuitive basis for understanding why 
bias is greater when effective area size (ENS) is small. It is because self-contact will 
have a bigger impact on assessments of an individual’s contact with the reference 
group when area counts are small as they are in this example. If the same exercise 
were repeated with area population size set to 5,001 instead of 21, the resulting 
magnitude of index bias would be much smaller. Alternatively, if the exercise were 
repeated with area counts of 9 (equivalent to a “Queen’s” neighborhood of eight 
adjacent neighbors plus the focal household), the magnitude of index bias would be 
even larger. 

Reflecting on the difference between unbiased contact (p’) and standard contact 
(p) also yields additional insight into why the expected level of bias varies from 
index to index. The role of self-contact in standard calculations of contact is to shift 
the distribution of values of p up for the reference group and down for the compari- 
son group. These shifts in p are then translated into impacts on scaled contact (y) 
based on the index-specific scaling function y=f (p) . Lestablished earlier that the 
scaling functions for G, D, R, and H are nonlinear. The nonlinearity has implica- 
tions for bias. Specifically, bias at the level of group differences on raw contact (p) 
will translate into larger group differences in scaled contact (y) when the scaling 
function is nonlinear and the magnitude of bias is greater when the scaling function 
is more strongly nonlinear. This provides a succinct explanation for why levels of 
bias are higher for G and D compared to R and H and why the level of bias is lowest 
for S. The scaling function y=f (p) for S is linear; so bias impacting the value of p 
is carried forward unchanged. The scaling functions for G and D depart from linear- 
ity the most; so bias impacting p is “amplified” to a greater degree when values of y 
are assigned for these indices. The scaling functions for R and H involve milder 
nonlinearity; so, while bias impacting p also is amplified when values of y are 
assigned, the resulting distortion is not as dramatic. 

Finally, this also provides an explanation for why bias in S does not vary with 
group size, but bias in the other measures, and especially in G and D, does vary with 
group size. The reason is that the nonlinear scaling functions for G and D measures 
become more strongly nonlinear when groups are unequal in size. This means that 
the role of nonlinearity in exaggerating group differences in y scored from p is mag- 
nified for these measures when groups are more imbalanced in size. 
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15.6 Summary 


This chapter reviews how the difference of means formulation of indices of uneven 
distribution leads to new insights about the nature of index bias and makes it pos- 
sible to address index bias at the point of measurement. The insight is that, when 
segregation is cast as group differences in means on scaled group contact, bias can 
be traced to a relatively simple source; namely, the role of self-contact which inher- 
ently and unsurprisingly differs by race. Eliminating self-contact from index calcu- 
lations by assessing group contact based on neighbors instead of area population 
eliminates this inherent source of bias in index scores. The chapter shows that 
resulting “unbiased” versions of unbiased indices are attractive for many reasons. 
They are attractive on formal grounds because analysis based on binomial probabil- 
ity models shows that they have expected values of zero under random assignment. 
They are attractive because the index refinements are easy to explain; for any indi- 
vidual group contact can be a random draw when computed using neighbors but it 
is always inherently biased when computed using area population that includes the 
individual. Finally, the unbiased versions of indices introduced here are attractive 
because they allow researchers to use familiar indices and apply familiar substantive 
interpretations as well as new interpretations. 

The next chapter presents evidence on another aspect of the unbiased versions of 
indices of uneven distribution introduced here; their behavior over varying circum- 
stances of study design. It uses simulation methodology to generated residential 
distributions over a wide range of circumstances and shows that the unbiased ver- 
sions of popular indices introduced in this chapter behave as desired in circum- 
stances where bias renders scores for standard versions of the indices untrustworthy 
and potentially misleading. It also shows that unbiased indices are attractive because 
they near-exactly replicate the behavior of standard versions of indices in situations 
where bias is negligible and they yield clearly superior assessments of segregation 
in situations where the impact of bias on standard versions of indices is 
non-negligible. 
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Chapter 16 
Comparing Behavior of Unbiased 
and Standard Versions of Popular Indices 


In the previous chapter I outlined the rationale for unbiased versions of indices of 
uneven distribution. Additionally, I presented results from analysis of expected 
group residential distributions under a binomial probability model to establish that 
the unbiased versions of popular indices have expected values of zero when residen- 
tial distributions are random. In this chapter I report analyses of the behavior of 
standard and unbiased versions of indices of uneven distribution to document two 
things: the potential undesirable impact of bias on the scores of standard versions of 
indices and the attractive behavior of the scores of unbiased versions of the same 
indices.! 

To document index behavior I conducted a series of simulation experiments to 
systematically “exercise” standard and unbiased versions of popular indices under a 
wide range of demographic contexts and neighborhood definitions. I performed the 
analyses using residential distributions generated by SimSeg, a computational 
model that simulates residential segregation dynamics. The SimSeg program has 
been described in more detail elsewhere (e.g., Fossett and Waren 2005; Fossett 
2006, 201 1a, b; Fossett and Dietrich 2009; Clark and Fossett 2008). Examining 
results generated by the SimSeg program is useful for the purposes of this chapter 
for two reasons. First, the program implements routines that calculate both standard 
and unbiased versions of G, D, R, H, and S. Second, the program can systematically 
generate residential distributions over a wide range of study designs that can reveal 
how the behavior of standard and unbiased versions of indices differ under varying 
circumstances. 

Using SimSeg I designed and executed simulation experiments that implemented 
a two-group city in which segregation is assessed using bounded neighborhoods of 
uniform size. The two groups in the simulation are of course “virtual”, but for con- 
venience of discussion and consistency with examples discussed in earlier chapters 


! The analyses I report in this chapter elaborate and extend analyses I conducted for an earlier study 
on this topic (Fossett and Zhang 2011) 
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I refer to them as “White” and “Black”. I varied the conditions of the experiments 
to exercise index behavior by varying the racial mix of the city randomly from 2 to 
98 % White separately in each experiment. I then ran 2,500 experiments separately 
for each of eight neighborhood sizes based on a square housing grid for the bounded 
area ranging from 3 to 10 houses on a side. The resulting neighborhood sizes were 
9, 16, 25, 36, 49, 64, 81, 100, and 225. The simulation experiments conducted using 
these varying settings for neighborhood size and city racial composition were rela- 
tively simple.* The program first created the relevant virtual neighborhoods and 
housing units within them. Next it created the virtual population of households 
according to the racial demography setting. It then distributed households distrib- 
uted randomly across housing units. Then, it calculated and recorded a battery of 
segregation index scores including scores for standard versions of all popular mea- 
sures of uneven distribution G, D, A, R, H, and S and unbiased versions for G, D, R, 
H, and S. 

Tables 16.1 and 16.2 report the means and standard deviations for scores for 
standard versions of indices of uneven distribution under random distribution at the 
initialization of the city landscape over varying conditions of effective neighbor- 
hood size (ENS) and percent White for the city (P). For economy of presentation, 
results are given only for ENS settings of 9, 16, 25, 49, 100, and 225. Inspection of 
Table 16.1 shows that the level and pattern of index scores varies systematically by 
index and over different combinations of settings for ENS and P. Inspection of the 
results presented in Table 16.2 shows that index scores vary in a relatively narrow 
range around the mean for index scores under any particular combination of settings 
for ENS and P and the results also show that the degree of dispersion in index scores 
is generally similar in magnitude across different indices. 

Figure 16.1 provides visual documentation of the patterns of index behavior 
summarized in Tables 16.1 and 16.2. The figure provides separate graphs for each 
index considered; namely, G, D, A, R, H, and S. Each graph plots the values of the 
relevant index score calculated from the random residential distribution at the begin- 
ning of the simulation experiment (i.e., cycle 0) against percent White in the city 
population (P). The graphs plot index scores for the simulations where effective 
neighborhood size (ENS) is set to value of 9, 16, 25, 49, and 100. In addition, each 
graph also plots a black line tracing the expected index score (e.g., E[D]) based on 
calculations using a binomial model (per Winship 1977). To reduce visual clutter 
and facilitate clarity of patterns, the graphs do not depict results for ENS settings of 


? Other details of the simulations are uniform across all simulations and have no impact on results. 
For example, neighborhoods are arranged to form an approximately circular form for the overall 
city. The dimensions of the city were calibrated to yield between 6400 and 8500 virtual households 
depending on number of households per neighborhood and the number of neighborhoods in the 
simulated city. The resulting virtual cities are similar in form to those described in Fossett (2006, 
2011a, b). I conducted additional simulations using larger cities with more neighborhoods and 
more virtual households. All relevant index behavior was fundamentally similar. So I used smaller 
cities to keep the computational burdens for generating the analysis data sets at reasonable levels. 


3Results for an unbiased version of Atkinson’s A are not shown because I have not been able to 
place this index in the difference of means framework. Hutchens R is a closely related measure. 
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Table 16.1 Means for standard versions of popular indices of uneven distribution computed for 
random residential distributions under varying combinations of relative group size (P) and 
neighborhood size 


Neighborhood size 
Index P’ 9 16 25 49 100 225 
Gini (G) s5 81.4 70.5 60.1 45.2 33.0 22.0 


11-15 56.5 43.7 35.1 25.1 17.6 11.7 
36-50 40.2 30.1 24.0 17.1 12.0 8.0 
Dissimilarity (D) <5 TED 62.8 48.0 33.4 24.0 15.6 
11-15 40.8 31.6 25.2 17.9 12.5 8.3 
36-50 28.7 21.4 17.1 12.2 8.5 5.7 
Atkinson? (A) <5 78.4 64.4 49.8 28.4 12.9 4.5 
11-15 41.9 23.3 13.0 5.5 2.6 11 
36-50 14.5 ho 4.7 2.4 1.1 0.5 


Hutchens (R) <5 54.0 40.8 29.6 15.6 6.7 2.3 
11-15 23.8 12.4 6.7 2.8 1.3 0.5 
36-50 7.6 3.8 2.4 1:2 0.6 0.3 
Theil (H) <5 34.8 24.1 16.9 9.0 4.4 1.8 
11-15 18.8 10.5 6.4 3.1 1.5 0.6 
36-50 9.9 5:3 3.3 1.7 0.8 0.4 
Separation (S) <5 12.3 6.9 4.4 2.2 1.1 0.5 


11-15 12.3 7.0 4.4 2.3 11 0.5 
36-50 12.3 6.9 4.4 2.3 17 0.5 


“Here P denotes the city-wide group percentage for the smaller group in the comparison 
Atkinson index (A) is computed with 6 set at 0.5, the value at which A is “symmetric” 


36, 64, 81, and 225. However, Table 16.2 documents that the results for these set- 
tings are consistent with the results shown in the figure. For example, the means for 
index scores when ENS is set at 36, 64, and 81 fall between the scores for ENS set- 
tings immediately above and below the ENS setting in question. 

The results presented in Fig. 16.1 document several clear patterns. First, all of the 
indices take values above zero in each and every simulation trial reflected in the 
12,500 data points plotted in the figure. The gray points for individual simulation 
trials indicate that index scores calculated from the random residential distributions 
at initialization in individual simulation trials vary in relatively narrow ranges 
around their expected values based on binomial theory. The Black lines show that 
the expected values of the indices based on analytic calculations vary systematically 
with effective neighborhood size and percent White in the city population. As noted 
earlier, the nature of the systematic variation in index scores is simple in its main 
features. For all indices, scores for both the expected values under random assign- 
ment and the observed random segregation at initialization in the simulations are 
systematically higher when effective neighborhood size (ENS) is lower. Thus, the 
highest curve is for the set of simulations that use the lowest value of ENS (in this 
case 9) and the curves move systematically lower as ENS moves to successively 
higher values. Also, for all indices except the separation index (S), both the expected 
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Table 16.2 Standard deviations for standard versions of popular indices of uneven distribution 
computed for random residential distributions under varying combinations of relative group size 
(P) and neighborhood size 


Neighborhood size 
Index pe 9 16 25 49 100 225 
Gini (G) <5 4.9 6.6 7.2 74 57 39 
11-15 2.4 2.4 2.4 2.1 1.4 1.0 
36-50 1.2 1.2 1.2 12 09 0.6 
Dissimilarity (D) <5 6.4 9.6 10.3 59 42 2.8 
11-15 1.8 2.2 1.9 15 ta 0.8 
36-50 0.9 0.9 0.9 09 06 0.4 
Atkinson’ (A) <5 6.1 8.9 10.4 10.7 63 2.0 
11-15 3.8 3.3 2.5 11 04 0.2 
36-50 1.2 0.6 0.5 0.3 02 0.1 
Hutchens (R) <5 6.7 7.6 75 6.5 3.4 1.0 
11-15 2.5 1.9 1.3 0.6 02 0.1 
36-50 0.6 0.3 0.2 o2 Oal 0.1 
Theil (H) <5 3.9 3.9 3.5 27 AS 0.6 
11-15 1.4 1.1 0.9 0.5 0.2 0.1 
36-50 0.6 0.4 0.3 0.2 Ol 0.1 
Separation (S) <5 0.7 0.5 0.4 0.3 0.2 0.1 
11-15 0.6 0.5 0.4 03 02 0.1 
36-50 0.7 0.5 0.4 0.3 02 0.1 


P is the city-wide, pairwise group percentage for the smaller group in the comparison 
>The Atkinson index (A) is computed with “tuning” value 6 set at 0.5, the value at which A is sym- 
metric 


values and the observed random outcomes are systematically higher when propor- 
tion White for the city (P) departs from balance at 0.50 and the expected values and 
observed outcomes take especially high values when P falls below 0.10 or rises 
above 0.90. 

The existing methodological literature has documented similar patterns of ran- 
dom variation for D many times before and also occasionally for G. But reports on 
patterns of variation for expected values of A, R, H, and S under random assignment 
are rare if they exist at all. To my knowledge, the results presented here are the first 
to systematically compare the bias behavior of all popular indices of uneven 
distribution. 

Comparing the figures for each index reveals several noteworthy differences in 
their behavior under random assignment. One obvious pattern is that indices vary 
considerably in the magnitude of bias under random assignment. The highest 
expected values under random assignment are observed for G followed closely by 
D and then A. The lowest scores under random assignment are for S. H and R have 
the next lowest scores for expected values. The “takeaway” point here is that D, the 
most popular and widely used index of uneven distribution has higher expected 
values under random assignment than all other indices except G. 
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Another clear pattern is that expected values under random assignment are lower 
for every index when effective neighborhood size (ENS) is larger. One additional 
finding is that, for all indices except S, index bias is highest, often alarmingly so, 
when group size is imbalanced. These findings provide at least some justification 
for two crude rules-of-thumb for research designs used in many segregation studies. 
One practice is that most studies in recent decades examine segregation scores cal- 
culated using data for spatial units with larger population counts (e.g., use tracts 
over blocks). This tends to promote, but does not guarantee, higher levels of effec- 
tive neighborhood size which, all else equal, serves to reduce bias. Another practice 
is that studies often avoid analysis of comparisons involving groups that are small 
in relative population size. All else equal, this tends to exclude comparisons where 
bias is likely to be larger. 

Another common practice in the empirical literature is to avoid analysis of com- 
parisons that involve groups that are small in absolute population size. The results 
presented here provide no support for this practice. Analytic formulas for bias (e.g., 
Winship 1977) identify a clear role of neighborhood size and relative group size but 
they do not identify a role for absolute group size. Empirically, absolute size may be 
correlated with relative group size but only relative size has a consequence for index 
bias. So if one is screening cases on relative group size there is no justification for 
additional screening on absolute size, at least not for the purpose of avoiding prob- 
lematic bias.* 

Similarly, there is no support in these results for the practice of “dealing with 
bias” by weighting cases in aggregate-level analyses by the size of the minority 
population. Absolute group size has no bearing on bias. Accordingly, weighting 
cases on minority size serves only to skew results toward findings for cities with 
larger minority populations. 

Figure 16.1 also reveals a few findings that are not currently widely appreciated. 
One is that for most indices, and especially for G and D, effective neighborhood size 
(ENS) and group ratio (GR) interact such that index bias is especially high when 
ENS is low and GR is highly imbalanced. This has an important practical implica- 
tion. It indicates that the standard rules-of-thumb commonly used in restricting 
analysis samples in empirical studies are crude and are not necessarily reliable for 
their intended purpose of identifying cases prone to high levels of bias. The standard 
rules of thumb are crude first because they applied using a “rough-and-ready” cut 
points when bias behavior varies continuously across ENS and GR and second 
because the rules are applied in a simple additive way and do not take account of the 
important interaction between ENS and GR that is so clear in these results. As a 
result, the prevailing practices can easily exclude cases where bias may low enough 
to be viewed as negligible (e.g., E[s] < 2-3); particularly when using R, H, and 


+In Chap. 8 I noted that screening on absolute group size may be relevant for other reasons. For 
example, if group size is insufficient to “fill” 3—5 of the spatial units used in measuring segregation, 
scores for indices that measure residential polarization will likely be biased downward because the 
spatial units being used may be too large to capture concentrated displacement for the group in 
question. 
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S. Conversely, they can sometimes include cases where bias is high and problem- 
atic; particularly when using G, D, and A. 

In sum, not only do current practices for dealing with bias greatly restrict the 
scope of segregation studies, they also are likely to be less reliable and effective for 
their intended purpose than researchers may realize. If researchers apply these prac- 
tices in future research, they should revise them to take account of the findings 
reported here. 


16.1 Documenting the Attractive Behavior of Unbiased 
Versions of Indices of Uneven Distribution 


I now review the behavior of the new unbiased versions of popular indices of uneven 
distribution under random distribution at the initialization of the city landscape. 
Tables 16.3 and 16.4 report the means and standard deviations, respectively, for the 
sampling distributions of scores for the unbiased versions of the indices over the 
simulations conducted over varying conditions of effective neighborhood size 
(ENS) and percent White in the city (P). Figure 16.2 documents these patterns visu- 
ally with separate graphs for G', D’, R’, H’, and S'.* As with Fig. 16.1, each graph 
plots the values of the relevant index score at the beginning of the simulation exper- 
iment (i.e., cycle 0) against percent White in the city population. Also as before the 
individual graphs plot observed segregation outcomes from simulations in which 
effective neighborhood size (ENS) is variously set to 9, 16, 25, 49, and 100. I should 
note two important differences from Fig. 16.1. One is that the expected values of the 
unbiased indices (e.g., E[D']) all are zero under calculations using an “exact” bino- 
mial model (per Winship 1977). So the resulting plotted “curve” for the expected 
values for all of the indices is a horizontal straight line centered on zero on the verti- 
cal (y) axis of the figure. The other is that the vertical range of the “y” axis of the 
figures is covers a much smaller range of scores than in Fig. 16.1. This aids in mak- 
ing visual inspection of patterns in Fig. 16.2. But it is important to take account of 
the difference when making visual comparisons with Fig. 16.1. The range of varia- 
tion is much smaller in Fig. 16.2 but this is not visually obvious. 

The graphs in Fig. 16.2 show that the unbiased index scores based on the 12,500 
random residential distributions vary in an approximately bell-shaped distribution 
around zero and thus take both negative and positive values. The vertical dispersion 
of unbiased index scores around the expected value of zero gives intuitive insight 
into the expected sampling distribution of the scores for the unbiased versions of the 
different indices. The dispersion depicts the range and pattern of index scores that 
occur when there is no statistical association between race and residential location; 
that is when residential distributions are random. Intuitively, this provides a basis 


>There is no graph for Atkinson’s A’ because I have not been able to place it in the difference of 
means framework. Hutchens R is a closely related index with an available difference of means 
formulation needed to develop and unbiased version of the index. 
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Table 16.3 Means for unbiased versions of popular indices of uneven distribution computed for 
random residential distributions under varying combinations of relative group size (P) and 
neighborhood size 


Neighborhood size 
Index P’ 9 16 25 49 100 225 
Gini (G) <5 0.1 0.5 0.4 0.7 1.0 1.5 
11-15 —0.2 0.2 0.5 0.6 0.7 1.1 
36-50 0.2 0.4 0.5 0.8 0.8 1.0 
Dissimilarity (D) <5 0.0 0.4 0.4 0.8 0.2 0.6 
11-15 —0.0 0.1 —0.1 0.3 —0.1 0.1 
36-50 0.2 0.2 0.1 0.1 0.1 0.1 
Hutchens (R) <5 0.1 0.2 0.1 0.1 0.1 0.1 
11-15 —0.1 0.2 —0.0 0.0 0.0 —0.0 
36-50 0.0 0.0 0.0 0.1 0.0 0.0 
Theil (H) <5 0.0 0.1 0.1 0.1 0.1 0.1 
11-15 —0.0 0.1 —0.0 0.0 0.0 —0.0 
36-50 0.0 0.0 0.0 0.0 0.0 0.0 
Separation (S) <5 0.0 0.0 0.0 0.0 0.0 0.0 
11-15 —0.0 0.0 —0.0 —0.0 0.0 —0.0 
36-50 0.0 0.0 0.0 0.0 0.0 0.0 


°P is the city-wide, pairwise group percentage for the smaller group in the comparison 


Table 16.4 Standard deviations for unbiased versions of popular indices of uneven distribution 
computed for random residential distributions under varying combinations of relative group size 
(P) and neighborhood size 


Neighborhood size 
Index P 9 16 25 49 100 225 
Gini (G) <5 4.9 5:3 5.8 6.7 5.2 3.2 
11-15 3:3 3.5 3.6 3.6 2.5 1.9 
36-50 2.4 2.4 2.4 2.4 1.8 1.2 
Dissimilarity (D) <5 4.8 5.4 5.8 6.9 6.1 4.3 
11-15 33 3.6 3.9 4.6 3.5 3:3 
36-50 2.5 XT 2.9 3.7 3.0 2.6 
Hutchens (R) <5 3.4 33 32 3.0 1.6 0.5 
11-15 1:7 1.3 0.9 0.5 0.2 0.1 
36-50 0.5 0.3 0.2 0.2 0.1 0.0 
Theil (H) s5 22 1.9 1:7 1.4 0.8 0.3 
11-15 1.2 0.9 0.7 0.4 0.2 0.1 
36-50 0.6 0.4 0.3 0.2 0.1 0.1 
Separation (S) <5 0.8 0.5 0.4 0.3 0.2 0.1 
11-15 0.7 0.6 0.5 0.3 0.2 0.1 
36-50 0.8 0.5 0.4 0.3 0.2 0.1 


°P is the city-wide, pairwise group percentage for the smaller group in the comparison 
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for evaluating observed scores for residential segregation. Observed scores that fall 
within the middle portion of the sampling distribution can easily occur by chance. 
But chance is a less plausible explanation for observed scores that fall in the low 
probability tails of the sampling distribution. Accordingly, scores in these regions 
are likely to reflect the impact of structured social processes that promote either 
greater or lesser segregation than would occur based on chance. 

Because the expected value for an unbiased index under the null hypothesis of no 
association between race and residential location is zero and the sampling distribu- 
tion is bell-shaped, one half of the values in the sampling distribution of an unbiased 
index will be negative. Some segregation researchers may not be initially comfort- 
able with seeing negative scores for unbiased indices. But negative scores have a 
straightforward interpretation on both narrow statistical grounds and also on sub- 
stantive grounds. On statistical grounds negative scores indicate that scores for the 
standard version of the index take values that are lower than would be expected 
under random assignment. Under the null hypothesis, negative values that fall in the 
middle region (e.g., in the middle 95% region) of the sampling distribution for 
unbiased index scores can be set aside in the usual way; they can be attributed to 
chance and the observed departure from the expected value of zero can be viewed 
as not statistically significant. In contrast, negative scores that fall in the left tails of 
the sampling distribution can be viewed as statistically significant; they are unlikely 
to occur by chance and thus invite a substantive sociological explanation of how 
(scaled pairwise) contact with Whites among neighbors could come to be higher on 
average for Blacks than for Whites. 

I note below that interesting sociological explanations are available. But I first 
pause to note that unbiased indices necessarily take negative values under exact 
even distribution. For example, consider the values of the standard and unbiased 
versions of the separation index for a city that is 90 % White and 10 % Black and has 
exactly 10 households per block. Under exact even distribution every block will 
have nine White households and one Black household. Proportion White among 
neighbors differs by race and will be 0.889 (i.e., 8/9) for every White household and 
1.000 (i.e., 9/9) for every Black household. In contrast, proportion White for area 
population will be 0.900 (i.e., 9/10) for every White and every Black household. 
Accordingly, the standard version of S will be zero but the unbiased version S’ will 
be -0.111. 

The comparison on D would be even more extreme. The value of the standard 
version of D would again be zero. But the value of the unbiased version D' would 
be—1.000 because all White households are scored 0 on attaining parity (i.e., 0.90 or 
higher) on proportion White among neighbors while all Black households are 
scored 1.° 

These negative values for unbiased indices under conditions of exact even distri- 
bution will be unfamiliar and perhaps also surprising to most readers, but they are 


°The example serves to highlight a difference between D and S; namely, that, whether in standard 
or unbiased form, D responds much more strongly than S to quantitatively small deviations from 
parity on racial proportions. 
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fully expected and have a clear substantive interpretation. Negative values result 
because exact even distribution — the zero point for standard measures of uneven 
distribution — is a highly unexpected outcome under random distribution. The occur- 
rence of such an unexpected residential distribution invites a sociological explana- 
tion identifying the structured social process that could bring about exact even 
distribution. Ready examples could include social dynamics such as quota systems 
in state policies governing assignments of households to housing units or institu- 
tional housing policies that structure housing assignments in dorms at colleges and 
universities, public housing, barracks in military bases, juvenile detention facilities, 
jails and prisons, orphanages, institutions for persons with disabilities, and the like. 
Thus, statistically significant negative values for unbiased indices are not only pos- 
sible, they can and should obtain in certain empirical settings (albeit not ones that 
are commonly studied) where group distributions are highly structured to produce 
even distribution. Thus, negative scores for unbiased indices are valid and carry a 
clear sociological meaning. 

Table 16.4 and Fig. 16.2 document patterns of dispersion in scores for unbiased 
indices under random distribution. The main differences across the five unbiased 
indices are seen in three areas. The first is the general level of volatility in the dis- 
persion of scores around the expected value of zero. Holding simulation conditions 
constant, scores for G’ and D’ consistently exhibit greater variability under random 
assignment; scores for R’ and H’ exhibit less variability; and scores for S' exhibit the 
lowest variability of all. 

Another interesting pattern in the sampling distributions of the unbiased indices 
is how the dispersion of index scores under random distributions varies with effec- 
tive neighborhood size (ENS). Table 16.4 documents that variability in the distribu- 
tion of scores around zero is greater when effective neighborhood size (ENS) is 
small. This pattern is highlighted in visual form in Fig. 16.2 by plotting the points 
in successively darker shades of gray as ENS increases in size from 9-16 to 25-49 
to 100 producing a concentration of the darkest points near the center of the 
distribution. 

A third pattern in the sampling distributions of the unbiased indices is how the 
dispersion of index scores under random distributions varies with city racial propor- 
tion; in this case proportion White in the city (P). Here the unbiased separation 
index (S’) stands apart from the other indices. Other things equal, the dispersion in 
the scores for S’ is constant across levels of percent White in the city (P). In contrast, 
a much different pattern holds for G’, D’, R’, and H’; they all exhibit greater disper- 
sion in index scores when percent White in the city (P) departs further from balance 
(i.e., 50). Figure 16.2 documents that the increase in the magnitude of the dispersion 
in index scores becomes especially pronounced when P begins to approach the 
bounds of 0 and 100. 

I offer the following intuitive explanation for these patterns. The pattern of dis- 
persion in values of the unbiased version of the separation index (S') serves as a 
ready benchmark. Variation in dispersion is a simple function of effective neighbor- 
hood size. This is easy to understand; smaller samples of neighbors lead to greater 
volatility in residential outcomes. Dispersion in S’ is unaffected by relative group 
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size because values of unbiased contact (p’) map on segregation-determining scores 
for residential outcomes (y’) without change. For all other indices, the scaling func- 
tions mapping scores of p’ onto scores of y’ are nonlinear. This assures that random 
deviations of p’ from P will be exaggerated. Furthermore, because nonlinearity in 
the scaling functions is stronger when group size is imbalanced, the impact will be 
greater when group size is more imbalanced. 

Finally, it is important to note that Fig. 16.2 documents that scores for unbiased 
indices are distributed symmetrically around zero at all levels of effective neighbor- 
hood size (ENS) and all levels of percent White for the city (P). So, while the mag- 
nitude of dispersion for scores for unbiased indices varies across indices and over 
study conditions, the expected value (zero) and shape of dispersion in scores (sym- 
metrical and bell-shaped) remain constant for all of the indices. 


16.1.1 Summary of Behavior of Unbiased Indices 


In sum, under random distribution, dispersion in scores of unbiased indices varies in 
magnitude depending on the particular index, the value of effective neighborhood 
size (ENS), and, with the lone exception of S’, percent White in the city (P). These 
patterns indicate that one must be mindful of these distinctive sampling distribu- 
tions for different indices when evaluating the statistical significance of particular 
index scores. Exact analytic solutions for standard errors of unbiased index scores 
under varying circumstances have not yet been established. For exploratory analysis 
“t” and “Z” tests for group differences of means on scaled contact with the reference 
group may perhaps serve as reasonable approximations. For more definitive assess- 
ments, researchers should use bootstrapping or other similar computation-intensive 
approaches that require less stringent assumptions regarding the nature of error 
distributions. 


16.2 Documenting Additional Desirable Behavior 
of Unbiased Indices Based on the Difference of Means 
Formulation 


I now review the behavior of standard and unbiased versions of popular indices of 
uneven distribution in multi-group situations. My purpose is to show that “norming” 
adjustments proposed by Winship (1977) and Carrington and Troske (1997) and 
discussed in Chap. 14 can be problematic in these situations while the unbiased 
indices that I introduce here behave in desirable ways. 

The essence of the problem with norming adjustments is that the expected values 
of indices under random assignment are more complicated in multi-group situations 
than previous methodological discussions have acknowledged. The logic of per- 
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forming “norming” adjustments proposed previously in the literature rests on the 
crucial assumption that the expected value of standard indices under random distri- 
bution is invariant (is a constant) under a given combination of area size and (pair- 
wise) group proportions. Unfortunately, this assumption is not correct. Instead, the 
expected value of standard indices is uncertain and can vary substantially even 
when area size and group proportions are known and simple in nature (e.g., all areas 
are constant size). The variation in index behavior traces to the presence of other 
groups in the population; the residential distributions for these groups can have non- 
trivial impacts on expected values of standard indices. This possibility ultimately 
undermines the potential effectiveness of previously proposed procedures for per- 
forming norming adjustments to deal with the impact of bias on the scores of stan- 
dard indices. 

I present results from simulation analyses conducted using the SimSeg simula- 
tion model to highlight the complex problems of bias in standard indices. The simu- 
lations all involve three groups; one large minority group, and two smaller minority 
groups. At the initialization of each simulation trial the households in the majority 
group are highly segregated from the households in the two minority groups but the 
households in the two minority groups are randomly distributed in relation to each 
other. This is depicted in the top panel in Fig. 16.3.” The simulation is then run for 
ten cycles (i.e., time periods). During each cycle, 25 % of households are chosen at 
random and are assigned randomly to a new residential location. Not surprisingly, 
systematic segregation between the majority group and the two minority groups 
quickly dissipates under this process of random movement resulting in majority 
households being randomly intermixed with minority households. This is depicted 
in the bottom panel in Fig. 16.3. At all times, starting at initialization and continuing 
to conclusion, the households in the two minority groups are randomly distributed 
in relation to each other. 

The simulation experiments I used to generate the results for the analysis here 
follow the general design used in the simulations described earlier. The simulations 
here use the same neighborhood size (25) and the same city size and area configura- 
tion (i.e., 256 areas and 6,400 housing units). The racial composition of the city is 
set at 80-10-10. A total of 2,500 separate simulation experiments are run using this 
setting. 

Index behavior is depicted in Fig. 16.4 which provides four graphs, two on the 
top row for the unbiased formulation of the dissimilarity index (D’) and two on the 
bottom row for the standard formulation of the dissimilarity index (D). The graphs 
in the left column depict majority-minority segregation; the graphs in the right 
column depict minority-minority segregation. The box plots in the top left graph 
show how D’ for the majority-minority comparison starts at very high levels and 
falls to zero as the ten cycles of random movement dissipate the initial segregation 
at the start of the simulation. The box plots in top right graph show that the distribu- 
tions of D’ for minority-minority segregation are always centered on zero as 


7Note that to facilitate visual inspection the example city depicted in the figure is smaller (about 
1/3rd size) than the city size used in the simulations. 
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Fig. 16.3 Illustration of 
the transition from the 
initial state of minority- 
minority integration and 
high majority-minority 
segregation to the end state 
of all-way integration 
(Random distribution) 
(Note: Households from 
the majority group and two 
minority groups are 
depicted in shades of gray 
(light, medium, and dark 
gray, respectively). Vacant 
housing units are in White. 
Grid lines delimit areas. 
For easy visual review, the 
city here is 40% the size of 
the city in the simulations 
but faithfully depicts city 
shape and residential 
patterns) 


expected since households in the two minority groups are distributed randomly in 
relation to each other over the entire course of the simulation. 

The box plots in the bottom left graph depict the distribution of scores for the 
standard version of the index of dissimilarity (D) for majority-minority segregation. 
This shows that D is very high at the beginning of the experiment and then falls 
sharply as households move randomly for ten cycles. But D does not fall to zero due 
to the intrinsic bias in D. Thus, the final level of D essentially reflects a “bootstrap” 
estimate of the expected value of D (E[D]) for majority-minority segregation under 
random assignment. The box plots in the bottom right graph depict the distributions 
of scores for D for minority-minority segregation. These reflect only random resi- 
dential variation over the course of simulation. The surprising finding here is that D 
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Majority-Minority Unbiased Delta (D") by 
Time Dimension of Simulation (Cycles) 


Minority-Minority Unbiased Delta (D') by 
Time Dimension of Simulation (Cycles) 
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Fig. 16.4 Box plots depicting distributions of scores for unbiased and standard delta Index (D’ and 
D) for majority-minority segregation and minority-minority segregation over ten simulation cycles 
(Note: The graphs in the top row depict unbiased delta Index (D') for majority-minority segrega- 
tion on the left and minority-minority segregation on the right. The graphs on the bottom row 
depict values for standard delta (D) for the same comparisons. See text for details regarding the 
simulation designs) 


increases over the course of the simulation. Why does this occur when the two 
minority groups are distributed randomly in relation to each other over the entire 
simulation? The answer traces to the complicated nature of effective neighborhood 
size in residential patterns for cities with three or more groups. 

As illustrated in Fig. 16.4, the simulations begin with the two minority groups 
being highly segregated from the majority group. Under this pattern, effective 
neighborhood size (ENS) for the minority-minority segregation comparison is 
approximately 25 (i.e., the size of the neighborhoods) because households from the 
two minority groups live together in a small subset of the city’s areas where major- 
ity households are absent. But the value of ENS for the minority-minority compari- 
son changes over the course of the simulation. Under the final pattern of random 
distribution for all groups, effective neighborhood size (ENS) for minority-minority 
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segregation falls to approximately 5 (i.e., 20 % of the neighborhood size of 25).° The 
change in ENS has important implications for the expected value of D under ran- 
dom assignment (i.e., E[D]) because E[D] is a negative function of effective neigh- 
borhood size. Consequently, over the course of the simulation, ENS falls from 25 to 
5 and the value of E[D] for the minority-minority segregation comparison increases. 

Figure 16.5 graphically summarizes results from additional analyses that repli- 
cate the analysis just reviewed using additional multiple racial demographic distri- 
butions for the virtual city. These are for group distributions of 80-15-5 and 91-6-3. 
The findings closely parallel those presented in Fig. 16.5. The results document two 
key findings. The first is that the unbiased version of D that is set forth in this study 
behaves in a desirable way under a wide range of conditions. The second is that 
standard version of D behaves in an undesirable way under these same conditions. 

These findings document that previous suggestions by Winship (1977) and 
Carrington and Troske (1997) for dealing with index bias face a serious obstacle. 
They suggest adjusting observed values of D in relation to D’s expected value under 
random distribution based on the calculation D*= (D 7 E[D]) / (1 — E[D]) . The 
obstacle this approach faces is that the proposed adjustments can be effective only 
when the value of E[D], whether estimated by formula or by bootstrap methods, is 
accurate. Unfortunately, the results just reviewed show that the value of E[D] for the 
minority-minority segregation is not a simple constant. In the simulations under 
review here the two minority groups are distributed randomly in relation to each 
other. Accordingly, the value of D for this comparison reflects a bootstrap simula- 
tion estimate of E[D] for the minority-minority segregation comparison. The results 
from the simulations show that the value of E[D] is significantly impacted by an 
important factor that is not considered in previous discussions of potential solutions 
for dealing with index bias. Specifically, the value of E[D] is impacted by how the 
two groups in the comparison are distributed in relation to a third group — that is, the 
value of E[D] for the minority-minority comparison is impacted by how the two 
minority groups are distributed in relation to the majority group. In more general 
terms, the findings reviewed here indicate that E[D] for any two-group comparison 
is complicated in the multi-group situation and will be affected by: (a) the extent to 
which the two groups in the comparison are jointly segregated from other groups 
and (b) the relative size of other groups in the city population. 

Space does not permit a detailed review of the issue, but in analyses not reported 
here, I have found that this finding applies to all standard indices of uneven distribu- 
tion and that two broad conclusions hold in multi-group situations. One is that 
expected values of index scores under random assignment (i.e., E[*]) can potentially 
vary over wide ranges. The other is that adjustments of index scores in relation to 
expected values (E[*]) based on assumptions of simpler conditions can be inappro- 
priate and perform poorly. In the extreme the adjustments can generate assessments 
of segregation that are as problematic as the original unadjusted index scores. 


8] say approximately because a precise discussion of effective neighborhood size would take 
account of the city vacancy rate (which is 6 % in these simulations). 
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This may help explain why adjustment methods such as those proposed by 
Winship (1977) and Carrington and Troske (1997) are rarely used in empirical anal- 
yses. My own experience has been that the adjustment methods work quite well in 
methodological exercises where the underlying assumptions of the method are met 
(or closely approximated). However, when I apply the adjustments in the context of 
multi-group situations, they tend to “break down” and often yield unexpected results 
sometimes including results that are substantively implausible. 

It is possible that the general approach of adjusting standard index scores could 
be “salvaged.” This could be accomplished by using more sophisticated methods to 
develop refined estimates of expected index values under random assignment (i.e., 
E[e]) that take account of the complications associated with population groups not 
included in the segregation comparison. For example, I have found that bootstrap 
methods can be used to obtain serviceable situation-specific estimates of E[*]. One 
approach that appears to work well is to take the observed distribution across areas 
of the combined count of the two groups in the segregation comparison. Then per- 
form bootstrap simulations wherein households from the two groups in the com- 
parison are assigned randomly to areas until the observed area counts for the two 
groups combined are duplicated in each area. Performing a sufficiently large num- 
ber of bootstrap simulations (e.g. 1,000 or more) will then establish the expected 
value of the index of interest under random assignment. 

Alternatively, one could apply formula-based methods to obtain expected values 
of indices. But the formulas would have to be refined to take into account the 
observed distribution of effective neighborhood size across areas of the city. This 
makes implementing the formulas more complicated and also more computation- 
ally demanding. 

Estimates of E[*] obtained in these ways are specific, not only to the nature of the 
multi-group residential pattern, but also to other potential complicating factors such 
as variation in area size. Unfortunately, most researchers are likely to view these 
technical refinements as exceedingly burdensome to implement. For example, in the 
simulation results just reviewed, the values of E[D] would have to be recalculated 
anew — using computation-intensive bootstrap methods or complex analytic compu- 
tations — at least at the beginning of every time period of the simulation and perhaps 
even more frequently in the early stages of the simulations when the empirically 
assessed value of E[D] is changing rapidly. For this reason, reason it is unlikely that 
this approach will ever gain wide use. 

The good news is that the unbiased indices I introduce in this monograph provide 
a superior alternative. The approach I propose is effective in both simple and com- 
plicated conditions, is conceptually appealing and easy to understand, and is much 
easier to implement in empirical analyses. The new unbiased indices I propose 
eliminate the source of bias at its root cause and do not rely on “after the fact” 
adjustments to purge unwanted consequences of index bias. Accordingly, the 
expected values of the unbiased indices are zero regardless of whether other groups 
are present in the population and, if so, regardless of the nature of the residential 
segregation pattern between the two groups of interest and other groups. Indeed, the 
only impact I have been able to discern so far is that the dispersion of the sampling 
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distribution of the unbiased indices is affected by the presence of other groups. 
More specifically, while the mean for unbiased indices is always approximately 
zero, the standard error of the mean varies inversely with ENS as basic sampling 
theory would lead one to expect. But this pattern holds for the expected distributions 
of scores of both standard and unbiased versions of indices of uneven distribution 
and so does not diminish the advantage of using unbiased versions of indices. 


16.3 Conceptual and Practical Issues and Potential Impact 
on Research 


When should researchers use the new unbiased versions of indices of uneven distri- 
bution I have introduced here? One simple and reasonable answer is that researchers 
can and should use the unbiased versions of the indices in most if not all situations. 
Unbiased versions of index scores are not burdensome to compute; they support 
familiar substantive interpretations; they also expand available substantive interpre- 
tations; they eliminate concerns that index bias may distort findings; and they give 
researchers the option to expand research designs to consider a wider range of situ- 
ations where standard versions of index scores would be untrustworthy and 
misleading. 

Significantly, few, perhaps no, unwelcome consequences are associated with 
using unbiased indices. If standard versions of indices of uneven distribution are 
non-problematic, the unbiased versions indices will closely replicate their scores. 
This is because scores of unbiased indices differ from scores of standard indices in 
meaningful ways only when the scores for the standard indices are problematic. 
When this is happens, the scores of the standard version of the index are called into 
question as untrustworthy for many research purposes and the scores of the unbi- 
ased version of the index provide a more trustworthy assessment of the nature of 
group differences in residential distribution. 

Will using the unbiased versions of familiar indices lead to major changes in 
research findings? I answer this question in two parts. The first part of my answer 
begins by noting that studies conducted in recent decades have tended to use 
research designs that try to guard against index bias. I have characterized the strate- 
gies used as a patchwork of practices that can be criticized for being crude and in 
some cases weakly justified. But in general the strategies do tend to minimize the 
most egregious impacts of index. As a result, findings of many, perhaps most, previ- 
ous studies using standard indices are not necessarily likely to be contradicted in 
dramatic ways if they are exactly replicated but using unbiased indices. I place 
emphasis on the phrase “exactly replicated” to stress that this means using exactly 
the same set of cases. Below I note that future studies may differ from past studies 
by being able to use a wider range of cases and more varied group comparisons 
instead of being limited to using the smaller, restricted set of cases and group com- 
parisons used in past research. 
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The reason why the specific findings of many past studies are not likely to change 
when exactly replicated using unbiased indices is straightforward. To the extent that 
the practices researchers have incorporated into research designs have been conser- 
vative and excluded cases that are most seriously affected by problems with index 
bias, replications that use unbiased versions of indices for the same cases will not be 
likely to yield dramatically different results. This is because the unbiased versions 
of indices yield scores similar to standard versions when bias is low. Substantively 
meaningful differences might arise for marginal cases that were not effectively 
screened because the ad hoc screening practices were crude and imprecise. But in 
many, perhaps most, studies these cases should not dominate the findings and so 
results will likely remain similar when the analysis is replicated using unbiased 
indices. 

Certain kinds of past studies would be most susceptible to changes in results if 
“exactly” replicated using unbiased versions of indices of uneven distribution 
instead of standard versions. These are studies where research designs were less 
stringent in screening out cases where index scores are most susceptible to bias. 
Examples would include: studies that use block-level data instead of tract data; 
studies that focus on segregation for groups that are imbalanced in size, studies that 
focus on subgroups that are small in combined size, and studies that are based on 
sample data instead of full count data. 

Another kind of study result that might change when replicated using unbiased 
measures are studies where findings differ when cases are weighted by minority 
population size in comparison to when cases are weighted equally. Presumably find- 
ings do often differ. Otherwise the practice of weighting cases would not be so 
widely used. Instead, an early study would report the finding that it makes no differ- 
ence and study designs would weight cases equally. The results reviewed here show 
that minority group size has no intrinsic relationship to bias. So the logical justifica- 
tion for weighting cases by minority group size to minimize the consequences of 
index bias can be questioned under all circumstances. The practice would clearly be 
unwarranted if studies are replicated using unbiased versions of indices. I suspect 
this might lead to some changes in findings. The current widespread practice of 
weighting by minority group size skews findings toward the cases in the sample that 
have larger minority populations. To the extent that this subset of cases has different 
segregation outcomes, from the remainder of the cases, findings would change 
when studies are replicated using unbiased versions of indices. 

A broader interpretation to the notion of study replication would lead to a differ- 
ent answer. “Exact” replications of past studies involves excluding many cases that 
can be included when using unbiased versions of indices. Similarly, “exact” replica- 
tions of past studies means foregoing many group comparisons that can be exam- 
ined when using unbiased versions of indices. The availability of unbiased indices 
frees the literature from the need to accept these past compromises in study design. 
With this in mind I now offer the second part of my answer. 

There are at least three ways that results for empirical studies are likely to change 
in welcome and potentially important ways when researchers adopt unbiased indi- 
ces. One is that using unbiased index scores will give researchers much greater 
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ability to discuss and compare specific cases without concern for the distorting 
influence of bias. These discussions are more difficult when standard scores are 
used. Scores for individual cases are potentially subject to different levels of distor- 
tion by index bias. Researcher recognition of this concern motivates the widespread 
practice of weighting cases differentially in statistical analyses. Concern about case- 
to-case variation in the impact of bias on index scores complicates the interpretation 
of scores of individual cities and it also complicates the direct comparison of scores 
for any given city with the scores of any other cities. Such complications are elimi- 
nated when using unbiased scores. Scores for individual cases can be evaluated with 
ease. Similarly, scores for two cases and scores for the same case at two points in 
time can be compared without concern. 

A second way results may change is that the logic of case weighting as imple- 
mented in statistical analyses in current studies will no longer be justified when 
using scores for unbiased versions of indices. The stated motivation for differen- 
tially weighting cases — that is, to minimize the distorting impacts biased cases may 
exert on findings — is of course negated entirely. The main implication of this is that 
results of statistical analyses will no longer be driven by segregation patterns for 
cities with large minority populations. It is unclear whether this will in fact lead to 
important changes in findings. But it is a distinct possibility that results of statistical 
analyses may differ because many cases which previously would have had little or 
no influence on results of statistical analyses will now carry equal weight. 

The third way using unbiased indices will impact segregation studies is the most 
important. It is that researchers will be free to greatly expand the scope of segrega- 
tion studies. Researchers will no longer need to limit analysis to the small subset of 
cities that survive sample restrictions and receive weights that give them dispropor- 
tionate influence on results after prevailing practices exclude and discount poten- 
tially problematic cases to guard against index bias. Instead, future studies will be 
able to conduct expanded analyses that may investigate segregation in many situa- 
tions that previously were not examined because conventions in restricting study 
designs foreclosed this possibility. Relatedly, using unbiased indices will allow 
researchers to consider many kinds of group comparisons that previously could not 
be considered. This includes, for example, comparisons involving small population 
groups and comparisons involving small subgroups within particular populations. 
In the past, such comparisons have gone unexamined because index scores are 
potentially subject to high levels of bias. These concerns can be set aside when 
unbiased versions of indices are used. 

Eliminating the need to impose draconian restrictions on research designs of 
segregation studies can only be a good thing. It will allow researchers to expand 
samples and explore a broader range of research questions. The following is a brief 
list of research applications where the benefits of using unbiased indices are espe- 
cially likely to be seen. 
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e studies assessing segregation at small spatial scales such as the census block and 
block group; or classrooms within schools; or the very small neighborhoods typi- 
cally used in agent simulation analyses of segregation’; 

e studies assessing segregation when groups are imbalanced in size; for example, 
studies of segregation involving small population groups such as Asian and 
Latino populations in areas of new settlement; and 

e studies assessing segregation for subgroups within broader populations which 
will result in small effective neighborhood size; for example, the segregation of 
Latino and Asian subgroups, and the segregation of high-income Whites and 
high-income African Americans. 


I conclude by strongly encouraging researchers to take advantage of the new 
option to use unbiased versions of popular indices of uneven distribution. One is 
never worse off for examining the new unbiased versions of popular indices and 
there are many ways they may yield benefits. Accordingly, I argue that it will always 
make good sense to examine the scores of the unbiased versions of indices. As I 
said, one can never be worse off for doing so because findings will be unchanged if 
bias is not a problem and the positive confirmation on this point will provide an 
additional basis for placing confidence in one’s findings. Moreover, there are many 
reasons to expect one would be better off, perhaps by a great deal, in comparison to 
following prevailing practices. Current “rule-of-thumb” practices that aim to mini- 
mize undesirable complications associated with index bias are crude and imprecise 
and can be “hit and miss” in effectiveness. Concerns on this point can be completely 
set aside by examining the unbiased versions of the indices even if one in the end 
elects to report results for standard versions of indices. However, it is likely that 
standard indices will be used as often as in the past because the availability of unbi- 
ased versions of indices of uneven distribution makes it possible for researchers to 
examine segregation in a wider range of situations than was previously possible. 
Once this occurs, scores for standard indices will be even less trustworthy than they 
currently are and researchers will increasingly need to rely on unbiased versions 
when attempting to answer the new questions these measures permit researchers to 
investigate. 


”In fact, I began pursuing the development of unbiased measures of uneven distribution to cope 
with the problem of bias in measuring segregation in simulation studies. In that context, the unbi- 
ased measures allow researchers to explore a much wider range of combinations of neighborhood 
scale and population composition than can be considered using standard versions of segregation 
indices. 
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Chapter 17 
Final Comments 


In this study I have shown that popular measures of residential segregation — the 
dissimilarity index (D), the gini index (G), the separation index (S), Theil’s index 
(H), and Hutchens’ index (R), a measure closely related to Atkinson’s index (A) — 
can be cast as group differences of means on residential outcomes (y) scored from 
area group proportions (p). This approach yields identical results as previous 
approaches to calculating index scores, so nothing is lost when adopting this formu- 
lation — all past research findings can be reproduced and replicated. Importantly, 
however, many benefits accrue from adopting the new approach to assessing 
segregation. 

One is that the approach serves to “demystify” aggregate segregation by reveal- 
ing its direct connections to residential outcomes for individuals. Segregation stud- 
ies generally have focused on describing aggregate distributions at the city level and 
have given little attention to the implications index scores have for the residential 
outcomes of the individuals in the groups being compared. This is very different 
from the approach that guides studies of group disparities in education, occupation, 
income, wealth, and other socioeconomic attainments. In these analyses, both the 
relevant attainment outcome and the attainment process that shapes its distribution 
are usually clearly in focus. Obviously, the literature on residential segregation rests 
on an implicit presumption that aggregate segregation arises from micro-level 
attainment dynamics that have consequences for the residential attainments of the 
individuals and households in the groups being compared. But, as Duncan and 
Duncan (1955) noted half a century ago, methodological approaches to measuring 
aggregate segregation have not pursued index formulations that can facilitate explor- 
ing these issues. The formulations I present here address this need by establishing 
that segregation indices reflect group differences on residential outcomes relating to 
group contact with differences between indices arising from differences in how they 
scale group contact. 

Another benefit of the approach I have outlined here is that it creates the possibil- 
ity of seamlessly joining the study of aggregate segregation with the study of 
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residential attainments. Previously, the two were necessarily separate. Now it is 
possible to directly investigate how residential attainment dynamics give rise to 
uneven distribution by framing aggregate segregation as the effect of race on indi- 
vidual-level residential attainments that additively determine the city-level segrega- 
tion score. This directly addresses the concern Duncan and Duncan (1955) raised 
that segregation indices serve to describe aggregate-level distributions but do not 
lend themselves to studying the underlying social processes that create these distri- 
butions. In addition, it creates new opportunities for refining segregation analysis by 
including non-racial characteristics in residential attainment models. This makes it 
possible to perform standardization and components analyses to investigate the 
extent to which segregation arises out of group differences in resources relevant for 
residential attainment and group differences in rates of converting their resources 
into residential attainments. It also makes it possible to use restricted access census 
micro files and non-census surveys to explore questions about aggregate segrega- 
tion that previously could not be explored. 

Joining the study of aggregate segregation with the study of micro-level residen- 
tial attainments also creates new options for investigating variation in segregation 
over time and across different metropolitan areas. If desired, city-level segregation 
can now be assessed by estimating the effect of race in city-specific individual-level 
models of residential attainment. Then the effect of race can itself be modeled as 
varying over time or with the ecological characteristics of metropolitan areas using 
multi-level models. The city-specific micro models can optionally include other 
social characteristics which may also influence residential outcomes. If not included, 
the effect of race in the model registers how aggregate segregation at the city-level 
varies with time and urban context. If included, the effect of race registers the level of 
and variation in racial segregation assessed net of controls for other characteristics. 

The approach to assessing segregation I have outlined here establishes a new 
basis for discussing, comparing, and evaluating segregation indices — namely, the 
substantive and theoretical relevance of the residential outcomes (y) the index reg- 
isters and responds to. When evaluating indices in terms of their qualities for sum- 
marizing and describing group differences in residential outcomes, one may 
consider whether the residential outcomes they register are substantively compel- 
ling for individuals and households or for particular policy goals. When evaluating 
indices in terms of their relevance for investigating segregation dynamics, one may 
consider whether the residential outcomes they register are salient in residential 
attainment dynamics. Do indices register outcomes that individuals and groups seek 
and potentially compete for? That is, do individuals and groups strive for the out- 
comes because they value them for their own sake and/or because they are corre- 
lated with other valued residential outcomes? Are the outcomes consequential for 
important aspects of life chances? Are the outcomes relevant to theories of residen- 
tial attainment and stratification? 

In the body of this monograph I reviewed how different indices register residen- 
tial outcomes (y) based on scaling area group proportions (p) in different ways. D, 
G, H, and R score y in complicated ways. D scores y as a two-value step function 
based on P — the pairwise group proportion for the city in question. G scores y as an 
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irregular nonlinear monotone function of p based on relative rank position (i.e., the 
percentile transformation). H and R score y as continuous nonlinear functions of p. 
For all four indices, the scaling of residential outcomes varies, often quite dramati- 
cally, with the racial mix of the city. To be clear, the functional forms of y=f (p) 
for D, G, H, and R can, and often will, vary with the groups involved in the compari- 
son, across cities for the same group comparison, or over time for the same group 
comparison in a given city. In contrast, S scores y directly from p under all condi- 
tions. In this regard, S stands out as the only index for which the scoring of y based 
on p is the same across different group comparisons, over different points in time, 
and across different cities. Because of this quality, I am drawn to the one-to-one 
scoring of y=p that S registers. It is simple and easy to understand and it is related 
to an aggregate segregation outcome that can easily be explained to non-technical 
audiences. In addition, there are good reasons to believe that the area group propor- 
tions that S registers are meaningful to individuals and households and consequen- 
tially are salient in residential dynamics. However, I recognize that discussion and 
debate on this issue is just beginning and I invite others to give attention to questions 
concerning what substantive concerns about residential dynamics and group differ- 
ences in residential outcomes should guide the choice to focus on particular specifi- 
cations of aggregate segregation. 

Finally, I note that casting uneven distribution as a difference of group means on 
residential outcomes (y) based on area group proportion scores (p), provides a new 
vantage point for understanding the origins and nature of bias in standard versions 
of popular indices of uneven distribution. In addition, it opens the door for a surpris- 
ingly simple and compelling solution that allows one to eliminate bias from index 
scores if desired. The scores of the resulting new “unbiased” versions of indices of 
uneven distribution are near identical to the scores of the conventional versions in 
situations where the conventional scores are non-problematic and they provide 
attractive alternatives in situations where conventional scores cannot be used — for 
example in the study of White-Latino segregation in new destinations (Fossett et al. 
2015). 

In closing, I note that the approach to investigating segregation I have outlined 
here complements and extends previous traditional approaches to studying aggre- 
gate segregation. It does not put approaches and findings from past research to the 
curb. To the contrary, the framework I offer here is fully compatible with main- 
stream traditions of research focusing on aggregate segregation. Casting segrega- 
tion in terms of group differences in individual residential outcomes and equating 
index scores with the effect of race in micro-level attainment models does not pre- 
clude pursuing traditional analysis of aggregate-level segregation; that remains as 
an option for those who prefer that approach. In addition, however, there now is a 
new set of alternatives for computing the indices that are used in such studies. More 
importantly, the framework I offer provides researchers new options for interpreta- 
tion and analysis that I believe many will view as potentially useful. These include: 
new options for extending previous research investigating variation in segregation 
across cities and over time; new options for taking account of non-racial social 
characteristics when investigating racial segregation; new alternatives for assessing 
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the substantive implications of segregation based on the consequences it has for 
group differences in residential outcomes; and new options for theorizing about and 
investigating the social attainment processes that give rise to aggregate segregation. 
I encourage researchers to adopt the refined measures and new options for analysis 
outlined here because I believe they will enable researchers to move forward in 
ways that will yield more trustworthy measurement of segregation and better under- 
standing of how group differences in residential distributions arise from group dif- 
ferences in residential attainments resulting from the role of race in residential 
dynamics. 
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Appendices 


Appendix A: Summary of Notation and Conventions 


This appendix reviews the notation and conventions for terms used in this mono- 
graph. Where appropriate it provides commentary to clarify usage by context. 


Pairwise Calculations 


In standard applications, indices of uneven distribution are based on pairwise popula- 
tion counts and group proportions. The adjective “pairwise” indicates that calcula- 
tions use only population counts for the two groups in the segregation comparison. If 
other groups are present in the population, their counts are excluded and have no 
impact on index scores. Accordingly, unless indicated by direct statement or by obvi- 
ous context, references here to total counts and terms based on total counts (e.g., 
group proportions) should be taken as being based on pairwise comparisons; that is, 
based on the sum of the population counts for just the two groups in the comparison. 


Reference and Comparison Groups (Groups I and 2) 


When index scores are calculated using the difference of means formulation intro- 
duced in this monograph it is necessary to designate one of the two groups in the 
segregation comparison as the “reference” or “focal” group. The second group is 
then designated the “comparison” group. The choice of which group is designated 
as the “reference” is arbitrary and it has no impact on the resulting index scores. The 
choice is necessary to organize calculations and facilitate presentation. For sub- 
scripting purposes it is convenient to designate the reference group as “Group 1” 
and the comparison group as “Group 2”. 
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The empirical literature on residential segregation in US urban areas overwhelm- 
ingly focuses on majority-minority segregation comparisons such as White-Black, 
White-Latino, and White-Asian comparisons. Based on substantive concerns 
regarding majority-minority inequality and assimilation, it is customary to assess 
residential distributions for different minority groups — e.g., Blacks, Latinos, and 
Asians — in relation to the residential distribution of the majority group — Whites. I 
follow this custom and thus designate the majority group — Whites — as the reference 
group. 

This has no consequence for index scores or for their substantive implications. 
But it does structure discussion and interpretation of results to focus on implications 
for majority-minority inequality and residential assimilation. 


City-Wide Terms for Pairwise Calculations 


N, = the city-wide population count for Group 1, the “reference” or “focal” group. 
N, = the city-wide population count for Group 2, the “comparison” group. 

T = the combined city-wide pairwise population count (T = N, +N, ). 

P= the city-wide proportion for Group 1 (P =N, /[N, +N, ]). 


Q= the city-wide proportion for Group 2 (Q=N,/[N,+N,]; Q=1-P). 


Area-Specific Terms for Pairwise Calculations 


i= index for the areas of the city; applied where appropriate, omitted to reduce 
clutter when unnecessary (e.g., when clear based on context). 

j= asecond index for the areas of the city used in formulas where one area (denoted 
by i) is compared to other areas (denoted by j). 

n, = the area population count for Group 1, the reference group. 

n, = the area population count for Group 2, the comparison group. 

t= the combined area pairwise count (t =n, +n, ). 


p = the area proportion for Group 1 (p =n, /[n, + n,]). 

q= the area proportion for Group 2 (q =n, /[n, +n,]; q=1—p). 

s, = the area share of the city-wide Group 1 population (s, = n,/N, ). 
s, = the area share of the city-wide Group 2 population (s, =n,/N, ). 


Terms for Individuals or Households 


k = an index for individuals in a group or, depending on context, in the city-wide 
population. 
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m = an index similar to k for individuals in a group or in the city-wide population. 
This is relevant for some formulas for the Gini Index (G) where individuals 
indexed by k are compared to all other individuals in the population indexed 
by m. 


Selected Terms and Conventions Relevant for the Gini Index (G) 


X, = cumulative proportion of Group | based on ordering areas from low to high 
on p; and then summing area group share terms (X, =È} s; over relevant areas). 
Y, = cumulative proportion of Group 2 based on ordering areas from low to high 


on p; and then summing area group share terms ( Y, = Xs, over relevant areas). 


Selected Terms and Conventions Relevant for the Theil Entropy 
Index (H) 


The original derivation of the Theil index is grounded in an information theory 
framework (Shannon 1948; Theil and Finizza 1971; Theil 1972) drawing on a 
notion of entropy (E) quantified as given below. 


E= entropy for the city overall given by E=P- Log, (1/P)+Q-Log, (1/Q). 
E, = entropy for area i given by E, =p, - Log, (1/p; )+q; -Log, (1/q, ) - 


Note that Log, denotes the base 2 logarithm. Many applications use natural loga- 
rithms in place of base 2 logarithms. 


Selected Terms and Conventions Relevant for the Atkinson 
Index (A) 


Formulas for the Atkinson index (A) include two constants — « and B. Values for a 
are restricted to fall between 0 and 1 exclusive of end points (i.e., 0<a <1). B is 
obtained by 1—a . The Atkinson index is symmetric when « is 0.5 and is asymmet- 
ric otherwise. When A is asymmetric it yields different index values depending on 
which of the two groups in the comparison is adopted as the reference group in the 
comparison. This leads some to view asymmetric versions of A as unacceptable for 
use as a general measure of segregation (White 1986). I agree with this view. 
Accordingly, discussion of the Atkinson index in this monograph is limited to the 
symmetric version where œ =ß=0.5 . This version of the Atkinson index has close 
relations with the Hutchens square root index (R) which is more tractable 
mathematically. 
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Appendix B: Formulating Indices of Uneven Distribution 
as Overall Averages of Individual-Level Residential Outcomes 


This appendix chapter reviews alternative formulations of indices of uneven distri- 
bution to clarify how aggregate segregation is related to individual residential out- 
comes. This is useful for at least two inter-related reasons; one substantive and one 
methodological. The substantive reason is that sociological interest in segregation 
usually rests on the assumption that it has important implications for individual life 
chances associated with area of residence. Based on this concern, it would be useful 
to better understand how indices of uneven distribution register individual residen- 
tial outcomes. The methodological reason is that formulating indices of uneven dis- 
tribution in terms of individual residential outcomes is a necessary step for clarifying 
how segregation emerges from individual-level residential attainment processes. 

The view that segregation emerges from micro-level attainment processes and 
carries important implications for group differences in residential outcomes is 
hardly new or controversial. In light of this it is surprising that methodological dis- 
cussions of indices of uneven distribution give little attention to this issue. For 
example, consider two familiar formulas for the widely used Gini Index (G) and the 
Delta or Dissimilarity Index (D) shown in Fig. B.1.! These formulas were featured 
five decades ago in Duncan and Duncan’s (1955) landmark methodological study. 
These formulas and close variations on them are widely used in empirical studies in 
part because they are computationally efficient and are easy to implement. However, 
Duncan and Duncan raised the concern that “[i]n none of the literature on segrega- 
tion indices is there a suggestion of how to use them to study the process of segrega- 
tion” (1955:216, emphasis in original). The reason for this is that the formulas given 
in Fig. B.1 provide little basis for understanding how segregation is connected to the 
residential outcomes of individuals. Indeed, individual-level residential outcomes 
are “invisible” in these formulas. 

Advances in computing technology have rendered the issue of computing effi- 
ciency mostly irrelevant. Yet it is still typical for the measurement of segregation 
using G, D, and other indices to be discussed in relation to convenient computing 
formulas. It is fine to use efficient computing formulas for the narrow purpose of 
obtaining index values. But researchers and broad audiences who gain their under- 
standing of segregation based solely on these formulas will have, at best, only vague 
notions regarding how segregation arises from micro-level attainment processes. 
This problem can be addressed by considering alternative formulations of popular 
segregation indices that clarify how index scores are connected to individual resi- 
dential outcomes. 


! Figure B.1 also includes a similar style formula for the more recently introduced Hutchens square 
root index (R) (2001). 
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G = 100: (ÈXi1Yi — È XiYi-1) where X and Y are group proportions cumulated over 
areas ranked low to high on pi (Duncan and Duncan 1955) 
100:2 X | M1:/N1) — (n2i/Nz2) | (Duncan and Duncan 1955) 


R = 100-(1.0 — © /(4;/N1) ` (n2:/N2) ) (Hutchens 2001:23) 


J 
ll 


Fig. B.1 Area-based computing formulas for indices of uneven distribution that do not draw on 
individual-level residential outcomes (Note: N denotes city-wide population count, n denotes area 
population count, subscripts 1 and 2 denote the two groups in the segregation comparison, sub- 
script i denotes area, X; and Y; denote the cumulative proportions of groups | and 2 over areas 
ranked from low to high on p; — the group 1 (reference group) proportion in the combined group 
population in area i (pi =n;;/[n; +n2;])) 


Focusing Attention on Individual-Level Residential Outcomes 


All widely used indices of uneven distribution can be formulated in terms of 
individual-level residential outcomes (y) that are scored from area group (e.g., 
racial) proportions (p). This can be done in two distinct ways. One is to formulate 
index scores as simple overall averages of individual-level residential outcomes (y). 
The other is to formulate index scores as a difference of group means on individual- 
level residential outcomes (y). Both approaches can be used to obtain “correct” 
index values. But that is a minor benefit as convenient formulas for obtaining cor- 
rect index values are readily available. The main benefit of these formulations is that 
they can be used to gain insight into how different indices register and summarize 
individual residential outcomes. In addition, formulating indices in terms of indi- 
vidual attainments brings certain practical advantages which I note below. 

Figure B.2 presents computing formulas that highlight how individual-level resi- 
dential outcomes are registered by six popular measures of uneven distribution — the 
Gini Index (G), the Delta or Dissimilarity Index (D), the Atkinson Index (A), the 
Hutchens Square Root Index (R), the Theil Entropy index (H), and the Separation 
Index (S) (also known as the variance ratio [V], and eta squared [17]). The calcula- 
tions indicated in these formulas involve first computing area-specific scores (i.e., 
neighborhoods) based on pairwise group proportions and then averaging these 
scores over individuals. More specifically, the formulas have the following 
features: 


e the core terms in the calculations are scores computed for areas (indexed here by 
“i”) based on calculations involving area group proportions; that is involving the 
values of p; and q; as given in Appendix A, 

e the area-specific scores are summed over all individuals based on weighting the 
score for each area by the area-specific combined population count (t) for the two 
groups in the segregation comparison, 

e the population-weighted sum of area-specific scores is then divided by the com- 
bined population of the two groups for the city (T) to obtain an overall average, 
and 
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G= 100-(1/2T?PQ)-==t-t; | pi—pj| 
100 -(1/2TPQ)-2tr[(1/T)-2t, | pi— py | ] (Moted to clarify area-specific term) 

D = 100-(1/2TPQ)-2t (|pi—P|) (noted to highlight similarities with S) 

A = 100-[1- (Q/P) {2 ti (pizqi®/QT)}/“] where 0 < a <1andB=1-a 
Setting a=B=0.5 yields the “symmetric” version of A, the version most relevant 
for use in segregation analysis. Using this setting for a, the formula of A can be 


expressed in the two formulations shown below to highlight similarities with 
formulas for the Hutchens square root index (R) and the separation index (S). 


100 -[1- {2 (t/T),/p;q;}?/PQ| (noted to highlight similarities with R & S) 

100 -[1- {(1/T)-2 tr/piq:} /PQ] (noted to highlight similarities with R & S) 
R= 100: [1.0 -5 (ti/T)/pigi/PO | (noted to highlight similarities with A & S) 

100- [1.0 - (A/T) Xt;./p;9;/PQ | (noted to highlight similarities with A & S) 
H = 100-=[(E-E)/E]:(t/T) 

100 -(1/T)-2 t: [(E—E:)/E] 

where E is entropy for the city overall given by E = P-Log2(1/P) + Q-Log2(1/Q) 

per information theory (Shannon 1948; Theil 1972) and E; is entropy for area i 


and is given by E; = p;Log2(1/pi) + qiLog2(1/qi). If desired, one can use natural 
logarithms as well as base 2 logarithms. 


S = 100:+(1/TPQ)-xt (pi—P)? (noted to highlight similarities with D) 
100 - [1-2 (ti/T)(pigi/PQ)] (noted to highlight similarities with A & R) 
100-[1-(1/T):2 tr(piqi/PQ)] (noted to highlight similarities with A & R) 


Fig. B.2 Area-based computing formulas for indices of uneven distribution that implicitly feature 
averages for individual-level residential outcomes 


e any other terms present in the formula serve only to rescale the resulting overall 
average to the range 0-1. 


Based on these features, it is appropriate to describe the resulting index value as an 
overall individual-level average on area-specific residential outcomes scored from 
area group composition (p). 

Figure B.3 reorganizes the expressions in Fig. B.2 to present them in a form that 
explicitly casts each index in terms of an index-specific, individual-level residential 
outcome (y) that is averaged over all individuals in the two groups in the compari- 
son. The formulas in this figure are not necessarily the most convenient for comput- 
ing index scores. But they make it clear that aggregate segregation index scores can 
be understood as simple summary measures (i.e., means) for individual residential 
outcomes. 

The individual level residential outcomes (y) identified in Fig. B.3 can be char- 
acterized as follows: the outcomes register the degree to which the group proportion 
for the area (p;) departs from the group proportion for the city as a whole. The spe- 
cific way in which this departure is quantified varies from one index to another and 
that becomes the basis for each one’s unique way of registering uneven distribution. 
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Averaging Scores for y 
Over Individuals Scores Assigned to Individuals 


G = 100- (1/T)-Zyx yk =È |px-pm | /2TPQ 
where k and m index individuals, px denotes the 
pairwise area proportion for the reference group (pi) 
for the k’th individual, pm denotes area proportion for 
the reference group (pi) for the m’th individual (note, 
this reorganizes the terms in the second formula for G 


in Figure B.2) 
D = 100-(1/T)-Zyx Yk = |pi—P|/2PQ 
A = Nocomparable solution is available but the value of the “symmetric” version of A 
(given by setting a=B=0.5) can be obtained from 2R-R? 
R = 100-[1-(1/T)-2yx] Yk = J piqi/PQ 
H = 100:(1/T):Zyx yx = (E—Ej\)/E with E; and E as given in Figure B.2 
S = 100-(1/T)-Zyx or Yk = (pi-P)*/PQ 


100-[1-(1/T)-Zyx] Yk = pigi/PQ 


Fig. B.3 Alternative formulas for uneven distribution that explicitly cast indices as overall aver- 
ages of residential outcomes (y) for individuals (Note: k and m index individuals, p, denotes the 
pairwise area proportion for the reference group (p;) for the k’ th individual, pm denotes area propor- 
tion for the reference group (p;) for the m’th individual) 


But all of the indices can be understood as registering average exposure to depar- 
tures from the group mix that would obtain under even distribution. If all neighbor- 
hoods have the group mix of the city as a whole, all of the values of y will be 0 and 
the final index value also will be 0. If members of the two groups never reside in the 
same areas, the values of y move to the extreme values that can apply to individuals 
residing in neighborhoods where p; is 1 or 0 and the sum of y goes to the maximum 
value possible for the city given its group composition. The resulting sum is then 
rescaled to yield an index value of | by incorporating index-specific constant terms 
(e.g., 2PQ for D). 


Options for Spatial Versions of Indices of Uneven Distribution 


These index formulations carry at least one practical benefit; they can be used to 
calculate spatial segregation scores as well as aspatial segregation scores for any of 
the indices. That is, 


Formulas that cast segregation index values as overall averages on individual-level 
residential outcomes can readily be adapted for computing spatial as well as 
aspatial versions of the segregation indices. 


Aspatial versions of segregation indices are familiar and widely used in empirical 
studies. They are obtained by applying the computing formulas introduced here, or 
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any of the formulas introduced earlier, using data for non-overlapping “bounded” 
areas such as school districts, census tracts, block groups, or blocks. In the aspatial 
formulation, each bounded area represents a particular neighborhood and every 
individual or household in the area is treated as having the residential outcome cal- 
culated for this area. 

When index values are cast as overall averages of individual-level residential 
outcomes as in Fig. B.3, the indices also can be implemented in spatial measures. 
This is accomplished by computing averages for individual residential outcomes (y) 
that scored for “overlapping” spatially-defined neighborhoods that are specified 
uniquely for each individual based on the population residing within a spatially 
defined neighborhood. For example, the spatial formulation could be implemented 
using census data by taking small bounded areas such as census blocks and defining 
the spatial neighborhood as the population residing in the “focal” block plus the 
surrounding adjacent blocks. In this approach the population in any particular block 
will be part of uniquely-defined, spatially-delimited neighborhood. 

When using these formulas, the question of whether the index is viewed as aspa- 
tial or spatial depends only on how “neighborhoods” are conceived. This can be 
stated in general terms as follows. Whether or not the index values obtained using 
these formulas are properly described as spatial or aspatial is determined by the 
definitions of the neighborhoods used to calculate the individual-level residential 
outcomes used in the relevant index calculations. If the residential outcomes are for 
non-overlapping bounded areas, the index values are aspatial. If the residential out- 
comes are for individual-specific, overlapping neighborhoods, then the index values 
are spatial. 


Summary of Difference of Means Formulations 


I now review a second way in which indices of uneven distribution can be formu- 
lated in terms of individual-level residential outcomes. This is to cast each index as 
a difference of group means on individual-level residential outcomes. Groups are 
designated as groups 1 and 2 with group 1 being taken as the reference group.” Each 
segregation index value (S) is then given as the difference of group means ( Y, — Y, ) 
on individual residential outcomes (y) that are scored as a function of the pairwise 
proportion for group 1 in the area in which the individual resides (i.e., y=f (p) ). 

Figure B.4 gives formulas for calculating values of popular segregation indices 
in this manner. My intent here is only to introduce formulas that place popular indi- 
ces of uneven distribution in the general “difference of group means” framework. 
Appendices C-F provide detailed discussions of the mathematical basis for the for- 
mulas given here. The body of the monograph provides a more general discussion 
of this new measurement approach and the benefits associated with adopting it. 


?The choice of which group serves as the reference is arbitrary in the sense that the index score 
obtained is the same either way. 
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Index Formulated as a Residential Outcome Scores (y) 
Difference of Means Assigned to Individuals Based on y = f(p) 

G = 100-2(¥, — Y2) yi = f(p:) = relative rank (quantile scoring) on pi 

D = 100:(¥, — Y2) yi = f(pi) = Oifpi<P, 1ifpi >P 


Alternatively, compute D as a simplified version of G based on collapsing area 
values for pj into a two-category rank scheme consisting of areas where pi < P 
and areas where p; = P. 


A = No direct solution is yet found but A = 2R—R? for the “symmetric” version of A 
given by on setting a=B=0.5. 

R = 100% — 73) vi = Q+(1-/piai/PQ)/(Pi/P- 44/2) 

H = 100-(¥;, — ¥) yi = Q+[(E-e:)/E]/(pi/P — q:/Q). 

S = 100-(¥, — Y2) Yi = pi 


Fig. B.4 Formulas casting indices of uneven distribution (S) as group differences of means 


( Y = Y, ) on individual residential outcomes (y) (Note: p; denotes the pairwise area proportion for 
the reference group (p;) in the area where individual i resides and y; is the residential outcome score 
generated by the index-specific scoring function f(p;)) 


For the moment I note that the approach is attractive on conceptual grounds 
because these formulas clarify that segregation indices measure whether groups to 
experience similar or different averages on specific residential outcomes. 
Additionally, the formulas reveal that differences between indices arise from a sin- 
gle source; the specific nature of the scaling function y=f (p) that scores residen- 
tial outcomes (y) from values of area group proportion (p). Area group proportion 
(p) reflects simple group contact or exposure in its original or “natural” metric. The 
scoring function y=f(p) rescales group contact and maps it onto an alternative 
scaling metric for residential outcomes (y) specific to the index in question. From 
this perspective all popular indices of uneven distribution register group differences 
of means on “scaled” pairwise group contact. 


Appendix C: Establishing the Scaling Functions y=f (p) 
Needed to Cast the Gini Index (G) and the Dissimilarity Index 
(D) as Differences of Group Means on Scaled Pairwise 
Contact 


This is the first of several appendix chapters which establish how popular indices of 
uneven distribution can be placed in the “difference of group means” framework. 
The feature of this framework is that the values of each index are obtained as a 
simple difference of group means on individual residential outcomes (y) that are 
scored from to 0 to | based on area group proportion (p) computed from pairwise 
population counts. Taking the familiar example of White-Black segregation, area 
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group proportion (p) can be set to proportion White of the combined White and 
Black population in the area; that is, p= w/ (w + b) where w and b are the counts of 
Whites and Blacks, respectively, in the area.’ Residential outcome scores (y) are 
then obtained from an index-specific scaling function y=f (p) that takes values of 
p that range from 0 to 1 and rescales them to new values that also range from 0 to 1. 
The segregation index score is then obtained from the difference ( Yy — Yp ) where 
Yw and Yg are the group means for Whites and Blacks, respectively, on residential 
outcomes (y). 

For individuals, p registers simple pairwise “contact” or “exposure” to the refer- 
ence group based on residing in a given area. In the example under consideration the 
reference group is Whites and p thus registers “contact with” or “exposure to” 
Whites. The residential outcome score (y) can be described as “scaled pairwise 
contact” or “scaled pairwise exposure”. Accordingly, the segregation index score 
can be described as a difference of group means on scaled contact; in the example 
under consideration, it is the White-Black difference in average scaled contact with 
Whites. 


The General Task 


The key to placing a particular index of uneven distribution in the difference of 
means framework is to identify a scaling function y=f (p) that accomplishes the 
goal of scoring residential outcomes (y) from area group proportions (p) such that 
the scores for y fall over the range 0-1 and yield the value of the index of interest as 
a difference of means on y for the two groups in the segregation comparison. I have 
identified scaling functions meeting these criteria for all popular indices of uneven 
distribution including: the gini index (G), the delta or dissimilarity index (D), the 
Hutchens square root index (R), the Theil entropy index (H) and the separation 
index (S). Placing these various indices in the difference of means framework gives 
them a common basis for interpretation and a specific basis for comparison. The 
common basis for interpretation is that all indices measure White-Black differences 
in average scaled contact with Whites. The specific basis for comparison is that the 
differences between index scores arise solely from differences in how index-specific 
scaling functions y=f ( p) map values of pairwise contact from its original or “natu- 
ral” metric based on area group proportion (p) onto values of residential outcomes 
(y). 

The main task of this appendix chapter and the ones that follow it is to establish 
the particular scaling function y=f (p) that will yield the value of the index in ques- 
tion. The general way task is to start with a generic expression of the difference of 
means formulation. 


3 Alternatively, p can be set to area proportion Black. The choice is arbitrary as the index score is 
the same either way. 
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Difference of Means Formula = (Yy—Y,) = (1/W)-=w,y, -(1/B)-=b.y, 


Then equate this formula to a standard formula for the index of interest and then 
manipulate the full expression to obtain a solution for y. In this appendix chapter 
and the ones that follow it I review steps that accomplish this task and establish a 
basis for an index specific scaling function y=f (p) relevant for G, D, R, H, and S. 

I expect that many readers will not be especially interested in the derivations of 
the relevant scaling functions. With this in mind, I presented only the final formulas 
in the main body of this monograph and in the overview discussion just provided in 
Appendix B. Readers who are not interested in the details of these derivations can 
rely on these earlier presentations and skip the remainder of this chapter and the 
additional appendix chapters that follow. For those who elect to slog through the 
technical details, I thank you in advance for your patience and forbearance. I claim 
only that the derivations accomplish what is needed and apologize for the fact that 
they are tedious and inelegant. 


Introducing the Function y= f (p) for the Gini Index (G) 


For the Gini Index (G) the relevant scaling function y =f (p) is relatively simple; it 
is the quantile (percentile) or relative rank transformation. 


y= quantile(p), or, more exactly 
y= 2-quantile(p). 


Under this scaling approach, households are assigned values on residential out- 
comes (y) based on the population-weighted relative rank position of their area of 
residence on area group proportion (p); more specifically, the quantile score on p for 
individuals. 

I review the quantile scaling function in more detail below. For the moment 
I note briefly that the scaling function y=f (p) for G is a continuous, monotonic, 
nonlinear transformation of p that changes p from its original or “natural” metric to 
a new scaling metric. The nonlinear transformation produces a curve that tends to 
rise faster when p is low and when p is high and tends to rise more slowly when p is 
in the middle ranges. As a result, the scaling transformation serves to exaggerate 
group differences on p over portions of the lower and upper ranges of the scale of p 
(1.e., p<0.25 and p>0.75 ) while compressing group differences on p over mid- 
dle portions of the range of p (i.e., 0.30 < p< 0.70 ). Thus, the quantile transforma- 
tion can and often does change small quantitative differences between Whites and 
Blacks on p into large differences on rank-order quantile scores. This in turn makes 
average White-Black differences on y larger than average White-Black differences 
on p. The tendency is moderate when groups are approximately equal in size. It 
becomes more and more pronounced when groups become increasingly unequal in 
size. 
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As formulated for the difference of group means framework, the Gini Index (G) 
for White-Black segregation can be given by 


Y, —Yy=G/2, or 


(C.1) 
(Yw -Yp )/0.5=2(¥,, - Y¥, )=G 
for y = quantile (p) , Or, alternatively, for y = 2:quantile (p) ; 
(Yw -Y,)=G. (C.1a) 


In this formulation residential outcomes (y) register each household’s relative rank 
position on area proportion White (p), Yw is the mean on y for White households, 
and Y; is the mean on y for Black households. One way to describe the formulation 
is that the value of G is the observed difference of group means on quantile scores 
for p divided by 0.5, the maximum value possible when scoring y as quantile scores. 
Alternatively, if y is scored as twice the quantile score (i.e., 2 -quantile(p)), G is the 
simple difference of means.* 


G Is a Measure of Rank Order Inequality on Contact 


Surprisingly, methodological reviews of segregation indices rarely make, much less 
emphasize, the point that the Gini Index (G) assesses uneven distribution in terms of 
group differences in rank order standing on area group proportion scores (p). This 
quality of G has been noted in methodological studies that review the application of 
G as a measure of inter-group inequality on ordinal variables. Lieberson (1976) 
introduced a measure of inter-group inequality on ordinal outcomes which he 
termed the index of net difference (ND). He characterized ND as being “analogous” 
to G (1976:281). Fossett and South (1983) noted that ND and G are more than 
analogous; they are mathematically equivalent (this is established in expressions 
(C.2a) and (C.2b) below). Accordingly, ND can be characterized as an alternative 
computing formula for G that supports an explicit and potentially attractive substan- 
tive interpretation in terms of group difference in rank advantage. 

This provides an initial basis for interpreting G for White-Black segregation as 
an index of relative rank difference between Whites and Blacks in their distribution 
on residential contact with Whites (p). Specifically, in the ND formulation, the 
value of G is the difference of two probabilities; (a) the probability that a randomly 
chosen White will have greater residential contact with Whites than will a randomly 


‘Under maximum uneven distribution all Whites live in neighborhoods that are 100% 
White and all Blacks live in neighborhoods that are 100% Black. Their respective average 
quantile scores on area proportion White will be 1—P/2 for Whites and Q/2 for Blacks. The 
group (White-Black) difference of means will be (1—P/2)—Q/2 which resolves to 


1-(P/2+Q/2)=1-(P+Q)/2=1-1/2=0.5. 
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chosen Black, and (b) the probability that a randomly chosen Black will have greater 
residential contact with Whites than will a randomly chosen White. 

Fossett and South (1983:861) note that the value of ND, and therefore G, can be 
obtained from the following computing formula 


ND=G= 5, 3,x-(w,/W)(b, /B) 


where i and j index areas ranked on area proportion White (p), and x is scored: 1 if 
(1>j), 0 if (i=j), and —1 if (i<j). This formula highlights that G responds solely 
to White-Black comparisons on rank order standing on area proportion White (p). 
Thus, it gives insight into why G is insensitive to the quantitative magnitude of 
group differences on p; G treats all White-Black differences on p as either 1 or —1, 
regardless of the difference involved is large or small. 

Fossett and Siebert (1997, Appendix A) also explore the formulation of G as a 
measure of inter-group inequality on ranked outcomes. They showed that G is a 
special case of Somers’ d,,, a measure of ordinal (rank-order) association. 
Consequently, G can be interpreted as an ordinal slope coefficient that indicates the 
impact of race (i.e., group membership) on the rank order standing of individuals on 
residential contact with Whites (p). Of more direct relevance for the present discus- 
sion, Fossett and Siebert also noted that the value of G can be given as twice the 
difference of group means on percentile (or quantile) scores for ranked outcomes. In 
application to White-Black segregation this means that G registers the White-Black 
difference of means on quantile scores for contact with Whites (p). 


Calculating G as a Difference of Means 


The procedure for obtaining the value of G for White-Black segregation as a differ- 
ence of means on residential outcomes (y) can be given as follows. First implement 
the relative rank scoring function y=f ( p) by ordering areas from low to high based 
on values of area proportion White (p;).° Note that p; is calculated using only counts 
for Whites and Blacks (i.e., p; =w, /(w, +b;)). Designate the number of house- 
holds in the area ranked lowest on area proportion White (p,) by tı based on 
t, = w,+b, where w, and b; are the counts for Whites and Blacks, respectively, in 
the area. Then calculate the average relative rank position (y,) on area proportion 
White (p,) for households in this area as y, = (t, / 2) / T where T is the combined 
population of Whites and Blacks in the city based on T= W+B. The calculation 
reflects the fact that households in this area occupy ranks | through t, on area pro- 
portion White (p) and so they all are assigned the average for this range of relative 
rank positions. The number of households in the area ranked next lowest on area 


> Areas that are identical on area proportion White (p) can be combined and treated as single areas, 
or they can be handled separately. There is no practical difference as the average score for y will 
be the same either way. 
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proportion White (p2) is designated by t». The average relative rank position (y2) for 
these households on area proportion White (p) is [t + (t, / 2) /T, the average for 
the relative rank position for households in the area. Continue with this procedure 
until all areas are scored on y. 

The resulting White-Black difference of means on y is then given by 


Yy—-Y, = Zw.y,/W-  b,y, /B. 


This result takes a value equal to G/2. 


Deriving G as a Difference of Means 


The next several sections establish that the difference of means formulation of the 
Gini Index (G) maps exactly onto the usual computing formulas for G. Unfortunately, 
the discussion is long and tedious. Readers who are not interested in these details 
should skip forward to the section that discusses the differences of means formula- 
tion of the Dissimilarity Index (D). 


Specifying Some Useful Terms and Relationships 


To begin, it is helpful to introduce several terms and establish certain relationships 
among them. I start by introducing the following three terms: 


pt, =t,/T , this term registers the i’th area’s proportion (share) of the city’s com- 
bined population of Whites and Blacks, 

pw, =w,/W, this term registers the i’th area’s proportion (share) of the city’s 
White population, and 

pb, = b; /B, this term registers the i’th area’s proportion (share) of the city’s Black 
population. 


When calculating G the areas of the city are ordered from lowest to highest value on 
area proportion White (p). This leads to the following terms 


cpt, = Zpt, = Zt, /T , cumulative proportion (share) of the city’s combined popula- 
tion of Whites and Blacks residing in areas ranked | through i on area proportion 
White (p), 

cpw; = Zpw, = Èw, /W , cumulative proportion (share) of the city’s White popula- 
tion residing in areas ranked 1 through i on area proportion White (p), and 

cpb, = Zpb; = Xb, /B , cumulative proportion (share) of the city’s Black population 
residing in areas ranked 1 through i on area proportion White (p). 


These terms can be used to give the familiar computing formula for G introduced 
by Duncan and Duncan (1955: 211) as 


Appendices 299 


G = =pw, :Zpb,_, —Xpb; -Zpw,_,. (C.2) 
This can be restated with alternative notation as 
G = Z (cpw; -cpb; )-2 (cpb, “CPW; ). (C.2a) 


Recognizing that (cpw, -cpb,_,) = (pw, -cpb,_,)+(cpw,_, -cpb,_,), and that 
(cpb, -cpw,_,) = (pb, -cpw,_,)+(cpb,_, -cpw,_,) , (C.2a) can be restated as 


G = X(pw, -cpb,_,)—2 (pb, cpw; ) (C.2b) 


Expressions (C.2), (C.2a), and (C.2b) are mathematically equivalent variations of the 
standard computing formula for G. Expression (C.2a) corresponds to the traditional 
computing formulas for G given in Duncan and Duncan (1955). Expression (C.2b) 
is an alternative computing formula for G which Lieberson (1976) termed ND. 


A Brief Demonstration 


I begin with an example that applies the terms introduce above to obtain G by the 
conventional formula and also demonstrates how the value of G can be obtained by 
the simpler approach of computing the difference of group means from percentile 
scores. The example case has just five areas, each one with 100 people. These are 
listed from high to low based on proportion White in the area. Appendix Fig. C.1 
lists for basic terms for each area. These include the group count terms (ti, wi, bi), 
proportion White for the area (p;), the proportion of the group population residing in 
the area (pti, pwi, pbi), and the cumulative proportion of the group population resid- 
ing in areas with area proportion White at or below p; (cpt;, cptw;, cptb;). 

Appendix Fig. C.2 presents terms that are used directly to calculate the value of 
G. The second and third columns in the figure present the terms used to calculate the 
value of G via the Lieberson (1976) “net difference” variation of the formula given 
in Duncan and Duncan (1955) (expression (C.2b) above). The difference between 
the sums for the two columns (i.e., 0.903—0.027) yields the value of G as 0.876. The 
fourth column gives the percentile score for each area as ranked on area proportion 


Area ti Wi bi pi pti pwi pbi cpti cpwi cpbi 
5 100 100 0 1.000 0.200 0.286 0.000 1.000 1.000 1.000 
4 100 95 5 0.950 0.200 0.271 0.033 0.800 0.714 1.000 

3 100 90 10 0.900 0.200 0.257 0.067 0.600 0.443 0.967 

2 

i 


100 65 35 0.650 0.200 0.186 0.233 0.400 0.186 0.900 
100 0 100 0.000 0.200 0.000 0.667 0.200 0.000 0.667 


500 350 150 1.000 1.000 


Fig. C.1 Example of calculating the Gini index — intermediate terms 
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Area pwrepbi1  pbicpwi-1 yi pwryi pbryi 
5 0.286 0.000 0.900 0.257 0.000 
4 0.262 0.015 0.700 0.190 0.023 
3 0.231 0.012 0.500 0.129 0.033 
2 0.124 0.000 0.300 0.056 0.070 
1 --- --- 0.100 0.000 0.067 

0.903 0.027 0.631 0.193 


Fig. C.2 Example of calculating the Gini index — final terms 


White (p). This is the residential outcome (y) relevant for computing G in the 
difference of means framework. The fifth and sixth columns give weighted 
sum calculations for obtaining separate group means on y for Whites and 
Blacks. Twice the difference of the sums for the two columns (ie., 
21%, -Y,) =2-(0.631-0.193) =2-0.438) also yields the value of G as 0.876. 


Getting on with the Derivation 


This example illustrates that the difference of means approach for obtaining G is 
simple and straight forward. The next task is to show how these formulas for G (C.2, 
C.2a, and C.2b) map onto the terms in the formulation of G as the White-Black dif- 
ference of means Yẹ — Y, on relative rank position on area proportion White (p). I 
apologize in advance for the fact that the derivation to follow is long and tedious. I 
suspect a simpler derivation can be given but I have not discovered it. What follows 
is one way to accomplish the task. 

My first step is to introduce the term RRT; as an alternative designation of y; as 
“relative rank” standing on area proportion White (p). Thus, 


RRT, = y; = (2pt,_,+pt,/2) = (2t; +t,/2)/T. 


The “RR” in “RRT” refers to relative rank and the “T” indicates that it is calculated 
for the total of the combined population of White and Black households (ignoring 
other households). Multiplying relative rank by 100 gives a percentile score. Given 
these terms, the White-Black difference of means for y; is given by 


Yy -Y, = =pw,-y;—-Zpb,-y;, or, alternatively, (C3) 
Yw- Y= 2 pw,-RRT, -È pb, -RRT,. 
Next I introduce two related terms — RRW; and RRB;. RRW; registers average 
relative rank position on area proportion White (p) based on the distribution of 
White households only and is given by 
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RRW, =(Zpw,_, + pw,/2) = (£w; + w;/2) /W 


RRB; registers the relative rank position on area proportion White (p) based on the 
distribution of Black households only and is given by 


RRB, =(Zpb,_, + pb,/2) = (Zb; +b, /2)/B. 


The terms RRT; RRW;, and RRB;, are closely interrelated. Specifically, each one 
can be defined in terms of the other two according to the following expressions. 


RRT, =P-RRW, +Q-RRB, (C.4a) 
RRW, = (RRT,-Q-RRB, )/P (C.4b) 
RRB, = (RRT,-P-RRW,)/Q (C.4c) 


The basis for expression (C.4a) can be clarified as follows 


RRT, = 


; =(Zpt,_, + pt; /2) 

(Zt +t, /2)/T 

(2w,_,+=b,_, +w, /2+b, /2)/T 

=(Zw,_, + w, /2)/T+(Zb,_, +b; /2)/T 

(Zw; + w, /2)/[ W-(T/W) ]+(2b,_, +b; /2)/] B-(7/B) | 
(W/T)-(2w,_, +w, /2)/W +(B/T)-(2b,_, +b, /2)/B 
(W/T)-(2w,_, + w,/2)/W +(B/T)-(=b,_, +b; /2)/B 
P.(=pw,_, + pw; /2)+Q-(Zpb,_, + pb; /2) 

=P-RRW, +Q-RRB,. 


Expressions (C.4b) and (C.4c) are simple rearrangements of (C.4a). 

The relationships among RRT;, RRW;, and RRB; help clarify how G relates to 
Yw — Yp. Expression (C.3) shows that the values of RRT; are directly used in com- 
puting Yw and Yg. Expression (C.4a) establishes that RRT; can be given in terms of 
RRW; and RRB;. These two terms can be incorporated into familiar computing 
expressions for G (yielding Eq. (C.5) below). 

Before reviewing this in more detail I first digress to note that values of RRW; 
and RRB; define points on the segregation curve, the well-known graphical repre- 
sentation of uneven distribution that supports an appealing geometric interpretation 
of G. The segregation curve is constructed by taking areas in ascending order of 
area proportion White (p) and then plotting cumulative proportion White 
(cpw; =Zw, /W ) against cumulative proportion Black (cpb, =Zb, /B ). The curve 
is contrasted with the diagonal line that would result under conditions of exact even 
distribution and the value of G is given by ratio of the area between the diagonal and 
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the curve to the total area under the diagonal. The values of RRW; by RRB; fall on 
the midpoints of the line segments that form the segregation curve. 

The values of RRW; and RRB; can be used to directly calculate the value of G. To 
see this, start with the following familiar computing formula for G given by Duncan 
and Duncan (1955: 211) 


G = pw, -Zpb,_, —Zpb, -Zpw, (C.2, restated) 


ig 
Then add 0 in the form of Zpw, - pb, /2—Zpw, - pb, /2 to obtain 
G = [Zpw, -Zpb,_, —Zpb, -Zpw,_,]+[Zpw, pb; /2—Zpw, - pb; /2 ]. 
Rearrange terms 
G = Epw, -[Epb,_, + pb, /2]-ZXpb, -[Zpw,_, + pw, /2]. 


Drawing on terms given earlier, substitute RRB; for [=pb, , + pb, / 2] and RRW; for 
[=pw, , +pw;/2] to obtain 


G = pw, -RRB, -2 pb, -RRW,. (C.5) 


For later notational convenience, I designate 2 pw; - RRB; as Gw and È pb; -RRW, 
as Gg to get the compact expression 


G = G,,-G,. (C.5a) 


Note that the terms Gy and Gg support straightforward substantive interpreta- 
tions. Specifically, Gw indicates the proportion of total comparisons between White 
and Black households where the White household is higher on area proportion 
White (p) and Gg similarly indicates the proportion of comparisons where the Black 
household is higher.® 


Yw- Y, = 2 pw, -RRT,—-Zpb;-RRT,. (C.3, restated) 


Expression (C.5) is very similar in form to expression (C.3) (restated here for 
convenience). This suggests that the relationship of G to Yw — Y, can be expressed 
in terms of specific relationships between the core terms in (C.3) and (C.5). This is 
indeed the case. The first relationship involves the terms 2 pw, -RRB, from (C.5) 
and 2 pw, : RRT, from (C.3). Their relationship can be given as 


= pw, ‘RRB, = (= pw, -RRT, -P/2)/Q. (C.6) 


°This corresponds closely to Lieberson’s (1976) index of net difference (ND) interpretation of 
G. The only difference computationally is how ties are handled in the computations. In Lieberson’s 
calculations, ties are dealt with separately. In this calculation, ties are apportioned in equal halves 
to each outcome. The resulting value of G (or ND) is identical. 
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The second relationship involves the terms Xpb,;-RRW, from (C.5) and 
x pw, -RRT,. from (C.3). Their relationship can be given as 


= pb, -RRW, = (Zpb, RRT, -Q/2)/P. 
(C.7) 


Similarly, the central terms in (C.2) for Yw — Y, can be expressed in relation to the 
terms in (C.5) for G based on 


Y, = Epw, RRT, = Q-E pw, RRB, +P/2=Q-G,,+P/2, and (C8) 
Y, = Zpb, -RRT, = P-E pb, -RRW,+Q/2 =P-G, +Q/2. (C.9) 


Restating these using more compact notation yields 


Gy = (Yw -P/2)/Q. (C.6a) 
G, = (Y, -Q/2)/P. (C.7a) 
Yy = Q-G,, +P/2 (C.8a) 
Y, = P-G,+Q/2 (C.9a) 


Establishing Expressions (C.6, C.6a) and (C.8, C.8a) 
For the sake of completeness I show here how expressions (C.6, C.6a) and (C.8, 
C.8a) can be obtained. I begin by drawing on (C.4b) to restate the term 2 pw, - RRB; 
from (C.5) and then rearrange the result as follows. 
= pw, ‘RRB, = =pw, -|(RRT,-P-RRW,) /Q| 
= pw, ‘RRB, = Zpw, -|(RRT, /Q)-(RRW, -P/Q)| 

= pw, ‘RRB; = = pw, (RRT, /Q)- pw, -(RRW, - P/Q) 

= pw, RRB, = (Z pw, - RRT, )/Q-( pw, -RRW, )(P/Q) 
The value of the term 2 pw, -RRW, is 0.5 because the mean of relative rank posi- 
tion is necessarily 0.5='%. Accordingly, the last expression can be simplified by 


substituting (12) for £ pw, -RRW, to obtain (C.6) as follows 


= pw, : RRB, =(z pw, : RRT, )/Q-(%)(P/Q) 
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= pw, ‘RRB, =(2 pw, -RRT,-P/2)/Q (C.6, restated) 
Or in more compact notation 
Gy = (Yw -P/2)/Q (C.6a, restated) 
Reversing sides and rearranging terms to isolate Yw yields 
(Yy -P/2)/Q= Gwy =G,, 
Y,, /Q-P/2Q = G,, 
Y, /Q = Gy +P/2Q 
Yy = Q: (Gw +P/2Q) 
Yw = OG, +P/2 (C.8a, restated) 
Expanding to less compact notation 


2 pw; : RRT, = Q-È pw, -RRB, +P/2. (C.8, restated) 


Establishing Expressions (C.7, C.7a) and (C.9, C.9a) 
Next I show here how expressions (C.7, C.7a) and (C.9, C.9a) can be obtained. I 
begin by drawing on (C.4a) to restate the term È pb; :RRW, from (C.5) and then 
rearrange the result as follows. 
Z pb, -RRW, = pb, -| (RRT, -Q-RRB, )/P | 
Z pb, :RRW, = 2 pb, -| (RRT, /P)-(RRB, -Q/P) | 
Z pb; -RRW, = =pb, -(RRT, /P)—Zpb, -(RRB, -Q/P) 
= pb; -RRW, = (pb, -RRT,)/P—(Zpb, -RRB; )(Q/P) 


Since XZ pb; -RRB, is 0.5='%, the last expression can be simplified by substituting 
(2) for È pb, -RRB, to obtain (C.8) as follows 


© pb, -RRW,= (Spb, -RRT,)/P-(¥4)(Q/P) 


= pb; -RRW, = (pb, -RRT, -Q/2)/P. (C.8, restated) 
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Or more compactly 
G, = (Y, -Q/2)/P. (C.8a, restated) 
Reversing sides and rearranging terms to isolate Yg yields 
Y,/P -Q/2P=G, 
Y, /P = G,+Q/2P 
Y} = P-(G, +Q/2P) 
Y,= P-G,+Q/2 (C.9a, restated) 


È pb; -RRT, =P -2 pb, -RRW, +Q/2. (C.9, restated) 


Some Implications of Expressions (C.6) and (C.7) 


Based on (C.6) and (C.7), G as given in (C.5) can be obtained from the core terms 
that define Y,,—Y, in (C.3) as follows 


G=(Zpw, -RRT, —P/2)/Q-(=pb, -RRT, —Q/2)/P (C.10) 
or, in more compact notation, 
G = (Yw -P/2)/Q- (Y, -Q/2)/P. (C.10a) 


Similarly, based on (C.8) and (C.9), the term Yw — Y, in (C.3) can be obtained from 
the terms that define G in (C.5) as follows 


Yw-Y, = (Q-Zpw,-RRB, +P/2)-(P-Zpb,-RRW,+Q/2) (C11) 

or, in more compact notation, 
Yw -Yp = (Q: Gw +P/2)-(P-G, +Q/2). (C.1 1a) 
These results establish that the value of the Gini Index (G) can be directly and 
exactly mapped onto the terms of the group difference of means ( Yy — Y, ) on resi- 


dential outcomes (y) scored on the basis of relative rank position on area group 
proportion (p). 
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The Role of P and Q in Scaling Terms when Groups Differ in Relative Size 


The results just reviewed show that, while the relationship between G and 
(Yw — Y} ) is exact, it also is complex. Expressions (C.10, C.10a) and (C.11, C.11a) 
clarify how scores for G map onto scores for (Yw — Y»). In this, it is clear that the 
terms for relative group size — P and Q — play important roles. How can this be 
understood? One answer to that question is that the operations involving P and Q in 
these expressions rescale the core terms of G so they will map onto the core terms 
of Yw-— Y,» and vice versa. This is necessary because the core terms in G and 
Yw — Yp have different logical ranges. Accordingly, the operations involving P and 
Q in expression (C.10) rescale the core terms of Yw — Y, so they will take the same 
value as their corresponding terms in G. Similarly, the operations involving P and Q 
in expression (C.11) rescale the core terms of G so they will take the same value as 
their corresponding terms in Yy — Y, . 

The logical ranges for both G and its core terms are constant across all combina- 
tions of P and Q. The core term 2 pw; -RRB, (i.e., Gw) has a logical range of 0.5 
based on having a minimum possible value of 0.5 under even distribution and a 
maximum value of 1.0 under complete segregation. The core term È pb; -RRW, 
(i.e., Gg) also has a logical range of 0.5 based on having a minimum possible value 
of 0.0 under complete segregation and a maximum value of 0.5 under even distribu- 
tion. Thus, G ranges from a minimum of 0.0 under even distribution based on 


G=} pw, -RRB, —2 pb; -RRW, = 0.5-0.5 = 0.0 
to a maximum of 1.0 under complete segregation based on 
G = pw, : RRB, —2 pb, -RRW, = 1.0-0.0 = 1.0 


The logical range for Y,,—Y, also is always constant but it is 0.5 not 1.0. This 
accounts for why G is divided by 2 in expression (C.1). Note, however, that the logi- 
cal ranges for the two core terms Yw and Yg are not constants. In each case one 
boundary of their logical range is a constant but the other boundary varies with the 
values of P and Q. For the term Y,,=2 pw, -RRT, the fixed boundary is its mini- 
mum possible value of 0.5, which occurs under even distribution. Its upper bound- 
ary (i.e., maximum possible value) is given by Q+P/2, which occurs under 
complete segregation and varies in exact value with city ethnic composition. For the 
term Y, = pb, : RRT, , the fixed boundary of its logical range is 0.5, its maximum 
possible value which occurs under even distribution. Its lower boundary (i.e., the 
minimum possible value) is given by Q/2 which occurs under complete segregation 
and varies in exact value with city ethnic composition. 

Thus, Yw -— Y, ranges from a minimum of 0.0 under even distribution based on 


Y,, -Y, = =pw, -RRT —E pb, -RRT,= 0.5-0.5 = 0.0 


to a maximum of 0.5 under complete segregation based on 
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Yw- Y, = Zpw, -RRT, -2 pb; RRT, = (Q+P/2)-(Q/2) 
= Q/2+P/2 =(Q+P)/2 = 0.5. 


In light of these points, Expression (C.10) can now be understood as follows. The 
values of P/2 and Q in the term (È pw, -RRT,—P/2 ) / Q rescale the value of the 
core term 2 pw, -RRT, used in computing Yw in Yw -— Y, to map its position in the 
logical range of 0.5 to (Q + P/2 ) onto the correct position in the logical range of 0.5 
to 1.0 for the parallel core term X pw, -RRB, used in computing G. Similarly, the 
values of Q/2 and P in the term (pb, -RRT,-Q/2) / P rescale the core term 
È pb; : RRT, used in computing Yg in Y,,—Y, to map its position in the logical 
range of Q/2 to 0.5 onto the correct position in the logical range of 0.0 to 0.5 for the 
parallel core term È pb; -RRW, used in computing G. 

Expression (C.11) can be interpreted in a similar way. P/2 and Q in the term 
(Q-Z pw, -RRB, —P/2 )/Q rescale the core term = pw, -RRB, used in computing 
G to map its position in the logical range of 0.5 to 1.0 on to the correct position in 
the logical range of 0.5 to Q+ P/2 for the core term x pw, -RRT, used in comput- 
ing Yw — Y. Similarly, Q/2 and P in the term (P -È pb, -RRW, —Q/2 ) / P rescale 
the core term È pb, -RRW, used in computing G to map its position in the logical 
range of 0.0 to 0.5 onto the correct position in the logical range of Q/2 to 0.5 for the 
core term È pb, : RRT, used in computing Yw —Y,. 


The Special Circumstance When P=Q 


Things are relatively simple when P=Q. This can be seen by rearranging terms in 
(C.10) to obtain the alternative expression. 


G = (1/Q)-2 pw, -RRT,-(1/P)-Z pb; -RRT,+Q/2P-P/2Q  (C.12) 
When P =Q, this resolves to 
G = 1/(1/2)-Z pw, -RRT, -1/(1/2)-= pb, -RRT, 
+(/2)/[2-(/2)]-@/2)/[2-(Y2)] 
G = 2-pw,-RRT,—2-Zpb, -RRT, +(1/2)-(1/2) 
G = 2-(Zpw, -RRT, -2 pb, -RRT ) 
G/2 = 2 pw, -RRT, -2 pb, -RRT, 


G/2 = Yy-Yy- (C.1, restated) 
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This corresponds to expression (C.1) presented at the beginning of this section. 
Similarly, rearranging terms in (C.11) leads to the following alternative 
expression. 


Y,, -Y, = Q-= pw, -RRB, —P-Zpb, -RRW, +P/2-Q/2 (C.13) 
When P=Q, this resolves to 
Y, -Y, = (1/2)-2 pw, -RRB, -(1/2)-2 pb, -RRW, +(1/2)/2-(1/2)/2 
Yw — Y, =(1/2)-(2 pw, ‘RRB, -X2 pb, -RRW, ) 
Y, -Y, =G/2 (C.1, restated) 


And this also corresponds to expression (C. 1). 


Summary Comments on Formulating G as a Difference of Means (Yw -— Yp) 
on Relative Rank 


The relationship in expression (C.1) now can be placed in broader context as fol- 
lows. The core terms that define G in expression (C.2) map directly and exactly onto 
the core terms that define Y,,—Y, in expression (C.3). Consequently, G can be 
described as registering the White-Black difference in average relative rank on area 
proportion White (p). Examined in the “natural” metric of relative rank scores, the 
difference of means Yw — Y, has a logical range of 0.0-0.5 while the logical range 
of G is 0.0-1.0. Hence, expression (C.1) equates the two measures based on 
Yy- Y, =G/2. 


The Dissimilarity Index (D) — A Special Case of the Gini Index (G) 


The dissimilarity or delta index (D) is closely related to the Gini Index (G). More 
specifically, D can be described as a special case of G where G is computed after 
areal units ordered on area group proportion scores (p) are collapsed into two cate- 
gories: areas where the group proportion score exceeds the city-wide group propor- 
tion (i.e., p>P ) and areas where it does not (1.e., p<P ). Based on this, D can be 
expressed as a difference of group means on residential outcomes (y) scored from 
area group proportions (p) in a manner comparable to that just outlined for G. 

D and G both are intimately related to the segregation curve, a graphical device 
for depicting uneven distribution popularized by Duncan and Duncan (1955). An 
example of a standard segregation curve is shown in Fig. C.3. The curve is based on 
block group data for Whites and Blacks in the Houston, Texas metropolitan area in 
2000 and is constructed as follows. First the areas (in this case block groups) are 
placed in ascending order based on proportion White (p) in the area. Then the curve 
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Fig. C.3 Example Segregation Curve for White-Black Comparison (Note: Units are ordered from 
low to high on area proportion White. Gini index is 84.7, Delta is 69.0) 


is traced by drawing line segments connecting the sequence of (x,y) pairings for the 
cumulated proportion of the White population (on the y-axis) and the cumulated 
proportion of the Black population (on the x-axis) as areas are taken in ascending 
order on the value of p. The resulting curve is contrasted with the diagonal line 
between the starting point (0,1) and ending point (1,1) of the curve. The diagonal 
represents the segregation curve that would obtain under the condition of exact even 
distribution. The gap between the curve and the diagonal visually indicates the 
degree of departure from even distribution. 

As is well known, G and D both have direct quantitative and geometric relations 
to the curve’s departure from the diagonal. G registers the departure quantitatively 
based on the ratio of the area between the curve and the diagonal to the total area 
under the diagonal. In the example shown, the value of G is 84.7. D registers the 
degree of departure quantitatively based on the maximum vertical difference 
between the curve and the diagonal and in the example shown has a value of 69.0. 

The geometric relationships to the segregation curve for G and D highlight an 
important difference between the two measures. The area interpretation of G makes 
it clear that its value is determined by the shape of the full curve. In contrast, the 
vertical line interpretation of D makes it clear that its value is determined by a single 
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Fig. C.4 Example segregation curves for white-black comparison (Note: Units are ordered from 
low to high on area proportion White. Gini index is 84.7, Delta is 69.0) 


point on the segregation curve. Accordingly, G responds to any residential shifts 
that promote more even distribution (i.e., that reduce the area between the diagonal 
and the curve) while D responds to such changes only if they affect the position of 
a particular point on the curve. The difference is highlighted in Fig. C.4. Here the 
segregation curve in the first graph is supplemented with a second segregation 
curve. This is a three point segregation curve defined by the triangle involving three 
points from the full segregation curve; the two end points (0,0) and (1,1) of the 
diagonal and the point on the full curve where the vertical distance between the 
curve and the diagonal is at its maximum. This last point determines the value of D 
so I designate it as (Xp, yp). In the example shown it is (0.132, 0.822).’ 


7 Becker et al. (1978) present a similar graphical analysis of D. 
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Fig. C.5 Example segregation curves for G and D with details (Note: Units are ordered from low 
to high on area proportion White. Gini index is 84.7, Delta is 69.0 = (86.8-17.8)) 


D Is G Calculated from a Special Three-Point Segregation Curve 


D can be seen as a special case of G calculated for the three-point segregation curve 
defined by the points (0,0), (xp, yp), and (1,1). More specifically, D represents the 
minimum value of G that can obtain for a curve that has the point (Xp, yp). This is 
depicted graphically in the detailed example in Appendix Fig. C.5. The relation- 
ships involved can be outlined in a general way as follows. Recall that the value of 
G is given by A/T where A is the area between the diagonal and the segregation 
curve and T is the total area under the diagonal which is 1⁄2. For the three point seg- 
regation curve associated with D, A is equal to the area of the triangle that forms the 
three-point segregation curve. Accordingly, A= %-b-h where A is the area of the 
triangle, b is the length of the base of the triangle, and h is the height of the triangle. 
The base of the triangle is the diagonal and thus b is equal to the length of the diago- 
nal which is V2 . The height of the triangle (h) is equal to the length of the line that 
extends perpendicular from the diagonal and ends on the segregation curve at the 
point (Xp, yp). This line is a side of a right isosceles triangle whose base has a length 
equal to the value of D — the maximum vertical distance from the segregation curve 
to the diagonal. Thus, h= D/ V2. 
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It follows that the area (A) between the diagonal and the three point segregation 
curve for D is given by A=%-b-h='- V2. (D 2) =M. D. It also follows that the 
value of the Gini Index (G) for the three point segregation curve is given by 
G=A/T =(%-D) / Y, which resolves to D. This establishes that the value of D is 
equivalent to the value of G for a simplified segregation curve analysis in which all 
areas of the city are grouped into just two categories; all areas where p<P, and all 
areas where p>P. 

The comparison of the three-point segregation curve with the full curve high- 
lights two characteristics of D. One is that D<G because the full segregation curve 
for G can never be “inside” the three-point segregation curve for D. Another is that 
D is insensitive to variations in residential distribution other than the distinction 
between residing in areas where p>P or not. Finally, D can be understood as the 
minimum possible value of G for a curve containing the point (Xp, yp) because D 
treats Whites and Blacks as experiencing only two relative rank scores and this 
maximizes ties between Whites and Blacks on relative ranks. Expanding the curve 
to consider more points cannot reduce the value of G as the construction principles 
are such that the segregation curve can only stay the same or expand outward from 
the three-point curve if more points are added to the curve. 


D Is a Simple Difference of Group Proportions Residing in Areas 
Where p> P 


There is an alternative computing approach for D that is simple and carries an 
appealing substantive interpretation. It is based on understanding D as the differ- 
ence in group proportions residing in areas where p=P. This interpretation traces 
to the fact that the maximum vertical difference between the curve and the diagonal 
occurs at a particular point on the segregation curve. Specifically, it is first encoun- 
tered at the end of the line segment on the curve for the last areal unit where p<P. 
It then is maintained for all subsequent points on the curve for areas where p=P. It 
is last encountered at the beginning of the line segment on the curve for the first 
areal unit where p> P. 

When there are no areas where p=P, the maximum vertical difference between 
the curve and the diagonal will be at a single point; the point where the line segment 
for the last area where p<P connects with the line segment for the first area where 
p>P. When some areas have p=P, the maximum vertical difference will be found 
at the beginning and end of the line segment formed for these areas. So it is correct 
to say that the maximum vertical distance corresponding to the value of D can be 
found at the following locations on the line segments that create the segregation 
curve. 


e the end point of the line segment for the first area where p< P 
e any point on line segments for areas where p=P 
e beginning of the line segment for the first area where p>P 
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Because the vertical distance is at its maximum at the beginning and end of line 
segments where p=P, one can say the maximum vertical distance is found 


e the end point of the line segment for the last area where p<P 
e the beginning point of the line segment for the first area where p>P 


This can be seen by reviewing the construction of the segregation curve in 
more detail. Starting at (0,0) the curve is formed by plotting line segments con- 
necting (x,y) points for group population shares that are being cumulated over 
areas taken in ascending order of p. Except in the unusual case of exact even dis- 
tribution, p<P for the initial areas and the line segments plotted for these areas 
will have a slope of less than 1. Accordingly, the curve initially falls away from the 
diagonal and the vertical distance between the curve and the diagonal increases 
with each successive area so long as p<P with the vertical distance being greatest 
at the end point of the line segment for the area. The maximum vertical distance is 
first reached when the sequence arrives at the first area where p=P. If the next 
area plotted is one where p=P (exactly), the line segment for that area will have 
a slope of | and will run parallel to the diagonal. The maximum vertical distance 
is maintained for all subsequent areas where p=P (exactly).® This changes when 
the sequence reaches the first area where p>P. At this point, the slope of the line 
segment plotted for that area will be greater than | and the segregation curve 
begins rising faster than the diagonal. Accordingly, the vertical distance between 
the curve and the diagonal will start to decline. It will continue to decline with 
each successive area in the sequence and the curve ultimately rises back to the 
diagonal to connect with the end point (1,1). 

This discussion makes it clear that the value of D can be understood as a simple 
difference of group proportions. Specifically, the value of D is equal to the difference 
between the proportions of Whites and Blacks, respectively, that reside in areas 
where Whites are represented at or above the level for the city overall (1.e., p2P ). 
For convenience, I designate the (x,y) pair for the beginning point of the line seg- 
ment for the first area where p2P as (Xp, Yp). Applying the subscript “D” indicates 
that the values of Xp and yp determine the value of D. The values of xp and yp reg- 
ister the proportions of Blacks and Whites, respectively, that reside in areas where 
Whites are under-represented (i.e., areas where p<P ). Under even distribution the 
value of yp would be equal to Xp. In light of this, the value of D is given by (Xp — yp), 
the vertical distance between the diagonal and the curve at this point. The values 
(1—x,) and (1—y,,) similarly indicate the proportions of Blacks and Whites, 
respectively, who reside in areas where Whites are represented at parity or higher 
(i.e., areas where p2P). D also can be obtained from ({1-yp]-[1- xp] ). This 
expression supports an appealing substantive interpretation of D; it is the White- 
Black difference in the proportions that reside in areas where proportion White is at 
or above the level of the city overall. 

The example presented in Fig. C.5 shows that 82.2 % of Blacks and 13.2% of 
Whites reside in areal units where Whites are under-represented (i.e., p<P ). It 


‘These points are noted in Becker et al. (1978) and Duncan and Duncan (1955). 
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likewise shows that 86.8% of Whites and 17.8% of Blacks reside in area units 
where the presence of Whites equals or exceed the citywide level (i.e., p2P ). The 
value of D can be obtained in either of two ways. It can be obtained from the Black- 
White difference in percentages in residing in areas where Whites under-represented 
(i.e., D=82.2—13.2 =69.0 ). Alternatively and more appropriately for the purposes 
of the present task, it can be obtained from the White-Black difference in percent- 
ages in residing in areas where Whites are represented at or above the level for the 
city overall (1.e., D= 86.8—17.8 =69.0 ). 


The Dissimilarity or Delta Index (D) — Alternative Functions for Scaling 
Contact 


The above discussion establishes at least two viable ways to score individual resi- 
dential outcomes (y) based on area group proportion scores (p) such that delta (D) 
can be obtained as a simple difference of group means. The first option is based on 
viewing D as a special case of the Gini Index (G). In this approach, y is scored as 
the relative rank (percentile) transformation of p applied to the two-category resi- 
dential scheme for the special case of the three-point segregation curve described 
above. In this case delta (D) can be given by an expression comparable to Expression 
(C.1) introduced earlier for G. Specifically, 


Y, -Y, = D/2, or, alternatively, 2(Y,,-Y,) = D 


where D can be understood as a special case of G. 

The second alternative involves an even simpler scoring scheme for y. This scal- 
ing function draws on the mundane fact that a proportion is equivalent to the mean 
for a variable that is scored 0 or 1. The above discussion established that D is equal 
to the White-Black difference in proportions residing in areas where p2P. 
Accordingly, the group proportions involved can be restated as group means on a 
variable that is scored 1 for individuals who reside in an area that reaches or exceeds 
parity on contact with whites White (i.e., areas where p2P ) and 0 otherwise (i.e., 
when p<P). This provides the basis for obtaining D by scoring residential out- 
comes for individuals (y) as 1 for areas where proportion White are at or above 
parity (i.e., p2P ) and 0 otherwise. Then compute the means for Whites and Blacks 
separately to obtain the value of D according to 


Yy—Y, = D. 


One benefit of the resulting difference of means formulation of D is that it calls 
attention to how segregation as measured by D is linked with individual residential 
outcomes. Specifically, this formulation highlights the fact that D registers group 
differences in average contact with Whites when contact is rescaled from its origi- 
nal, “natural” metric of p — which can vary continuously over the range of 0-1 
(inclusive) — to a binary scoring of either 0 or 1. Seeing D formulated in this way 
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may raise questions concerning the methodological implications and desirability of 
collapsing p to a dichotomy when assessing group differences in exposure. I leave 
these issues for discussion elsewhere. 


Alternative Graphical Explorations of Relative Rank Position 


Before concluding this appendix chapter, I offer additional comments on the topic 
of relative rank position. The preceding discussion establishes that the values of G 
and D reflect group differences in relative rank position on area proportion White 
(p). It is surprising that this is not already more widely appreciated because G and 
D have close relationships with the segregation curve which is an appealing graphi- 
cal device for comparing group differences in distribution over areas ranked on 
proportion White (p). With this in mind it is instructive to directly consider group 
distributions on relative rank position. 

To that end, Fig. C.6 presents graphs that help provide additional insight into 
how relative rank position relates to group distributions. The figure presents 6 
graphs. Each graph plots three curves that are constructed by first ordering areas 
from low to high on area proportion White (p) and then plotting the cumulated pro- 
portions of the White and Black population against the cumulated proportion of the 
total (combined White and Black) population and then also plotting the cumulated 
proportion of the total population against itself to form a diagonal line rising from 
(0,0) to (1,1). These plotted values are designated here designated as 


Cumulative Proportions for Groups 
Cumulative Proportions for Groups 
Cumulative Proportions for Groups 


— - + + 
0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00 
Cumulative Proportion of Total Population Cumulative Proportion of Total Population Cumulative Proportion of Total Population 


Note: P = 0.59, G = 0.000, YIA YIB] = 0.500-0.500 = 0.000. Note: P = 0.50, G = 0.900, YIW}Y{B} = 0.725-0.275 = 0.450. Note: P = 0.50, G = 1.000, YIW}-Y[0] = 0.750-0.250 = 0.500. 
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Cumulative Proportions for Groups 


= v v + 
0.00 020 040 060 080 1.00 0.00 020 040 060 080 1,00 0.00 020 040 060 080 1.00 
Cumulative Proportion of Total Population Cumulative Proportion of Total Population Cumulative Proportion of Total Population 
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cpw; = Zpw, = Zw,/W, 
cpb, = Zpb, = Xb, /B, and 
cpt; = Spt, = Xt,/T. 


The graph that results from plotting these values as described is similar to the 
segregation curve in one key respect; under conditions of exact even distribution, 
the curves for the White and Black population will coincide with the diagonal line 
for the total population. So the diagonal is a reference point for even distribution. A 
key difference from the segregation curve is that under conditions of uneven distri- 
bution, the curve for the cumulating proportion of the Black population will rise 
above the diagonal and the curve for the cumulating proportion of the White popula- 
tion will fall below the diagonal. Like the segregation curve, the areas between the 
curves and the diagonal in this graph have relationships to the values of G and 
D. This should not be surprising since the information plotted is very similar to the 
information plotted in the segregation curve. However, the visual representation 
here is distinct. 

One feature of this graphical device is that the diagonal directly reflects relative 
rank position on area proportion White (p). Thus, the contrast between the diagonal 
and the curves for Whites and Blacks provides a basis for grasping their differences 
in relative rank position. A curve that rises above the diagonal is skewed toward 
below average rank positions. A curve that falls below the diagonal is skewed 
toward above average rank positions. The implications of the curves for group 
means on relative rank position are depicted graphically by plotting two vertical 
lines; one indicates the value of mean relative rank for Whites (Yw) and the other 
indicates mean relative rank for Blacks (Ys). Under conditions of exact even 
distribution, these will necessarily coincide at the value of 0.50, the overall mean on 
relative rank for area proportion White (p). Where these two values differ, the value 
for Yw exceeds 0.50 and is necessarily higher than the value of Yg which falls below 
0.50. As noted earlier, the logical range for Yw is from 0.5 to Q +(P/ 2) and the 
logical range of Yp is from Q/2 to 0.5, and the maximum value for (Yw — Y») is 0.5 
which occurs under complete segregation. 

The graphs in the figure are organized by two rows and three columns. The three 
columns are for three conditions for segregation. The graphs in the first (leftmost) 
column are for the extreme condition of exact even distribution where the value of 
G is 0. The graphs in the third (rightmost) column are for the opposite extreme con- 
dition of complete segregation where the value of G is 100. The graphs in the mid- 
dle column are for substantial, but not complete, segregation where the value of G 
is 0.900.? The two rows are for two conditions of city racial composition. The top 
row is for a city where P and Q are both 0.50. The bottom row is for a city where P 
is 0.80 and Q is 0.20. 


° These segregation curves are based on simulated data generated using the hyperbola model for the 
segregation curve described in Duncan and Duncan (1955: 214). 
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The graphs on both rows of the first column look the same. This is because under 
conditions of even distribution cpt, =cpw, =cpb, and the graph will necessarily 
consist of three identical diagonal lines rising from the lower left to the top right and 
this pattern holds regardless of the values of P and Q. Similarly, the vertical lines 
depicting the values of Yw and Y, coincide and both are plotted at the value of 0.50. 

When segregation exists, each of the three curves will be distinct. This is seen in 
the two graphs in the middle column of the figure which are for examples where the 
value of G is 0.900. The diagonal lines in the two graphs are produced by plotting 
cpt; against itself. Because areas are ordered from low to high on area proportion 
White (p), the curves plotting cpb; by cpt; rise faster than the diagonals. In contrast, 
the curves plotting cpw; by cpt; rise slower than the diagonals. The vertical lines in 
these graphs indicate that, as noted above, the means on relative rank (y) for Blacks 
(Ys) are below 0.50 and the means on relative rank (y) for Whites (Yw) are above 
0.50. The variation in location in the top and bottom rows documents how the par- 
ticular values of the group means depend not only on the level of segregation 
involved but also on the values of P and Q. In both cases, however, the difference of 
means Yw — Y, is 0.450 and is equal to G/2. 

The graphs in the third (rightmost) column depict the extreme condition of com- 
plete segregation where G is 1.00. Again the diagonal lines in the graphs reflect the 
curves plotting cpt; by cpt;. The curves plotting cpb; by cpt; rise from 0.0 when cpt 
is 0.0 to 1.0 when cpt is Q (which is 0.5 in the top graph and 0.2 in the bottom graph) 
and then remain at 1.0 until cpt is 1.0. The curves plotting cpw; by cpt; stay at 0.0 
until cpt; reaches Q, then climbs to 1.0 when cpt reaches 1.0. Here the vertical lines 
depicting the means on relative rank (y) for Blacks (Yp) are at the value Q/2 which 
is 0.25 in the top graph and 0.10 in the bottom graph. In contrast, the vertical lines 
depicting means on relative rank (y) for Whites (Yw) are at the value Q+ P/2 which 
is 0.75 in the top graph and 0.60 in the bottom graph. In both of these example cases, 
the difference between the two means is 0.5, the maximum possible value the differ- 
ence can take. This is one half of G’s maximum value of 1.0, consistent with rela- 
tionship in Expression (C.1). 

The graphs in Fig. C.6 illustrate an important implication of expressions (C.4b) 
and (C.4c); namely, that the height of the curves for cpb; and cpw; at a given value 
of cpt; will depend on two factors. One, obviously, is the extent of segregation 
between Whites and Blacks. That is made clear by the progression across columns 
for either row of the figure. The other factor is the relative sizes of the groups in the 
comparison; that is, the ratio of P and Q. That is made clear by how the curves for 
cpb; and cpw;, and the group means associated with these curves (plotted as vertical 
lines), differ with the value of P. 

I offer one last set of comments on the graphs in this figure. G and D have defi- 
nite relationships to the graphs in Fig. C.6. The area between the curve plotting cpb; 
by cpt; and the diagonal equals the value of G for the comparison of Blacks against 
total (Grs). The area between the curve plotting cpw; by cpt; and the diagonal equals 
the value of G for the comparison of Whites against total (Gry). The sum of these 
two determines the value of G for the comparison of Whites to Blacks. Specifically, 
G is given by the ratio of the sum of these two areas to 0.5, the maximum possible 
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value for the sum. D is equal to the maximum vertical distance between the curves 
for cpb; and cpw; and, exactly as is the case for the segregation curve, this is value 
is seen at the last area where p; <P. 

One implication I stress here is that the segregation curve, while familiar and 
appealing in many ways, is not the only graphical device for comparing group dis- 
tributions over areas ranked on area proportion White (p). The graphs presented here 
contain the same information as the segregation curve and like the segregation curve 
they support a geometric interpretation of the values of G and D. In addition, they 
provide a more direct basis for assessing group differences on residential outcomes 
(y) that are scored to reflect relative rank position on area proportion White (p). 


The Nature of the Y-P Relationship for G 


The nature of the y-p relationship for the Gini Index (G) is complex and difficult to 
summarize. Since the relationship is based on a relative rank (percentile or quantile) 
transformation, the y-p relationship is monotonic and positive. But few general 
statements beyond that can be offered. 

I have explored the relationship by performing simulation studies to gain insight 
into the nature of the y-p relationship. I cannot provide a full review of these explo- 
rations here. But I will provide a brief summary of key points. The simulations 
assumed a model city with the following characteristics. It has 1000 neighborhoods 
with 10,000 persons in each neighborhood and only two groups — Whites and 
Blacks. I populated individual neighborhoods based on a model segregation curve; 
specifically, a segregation curve defined by the “hyperbola model” described in 
Duncan and Duncan (1955: 213-215). By using the hyperbola model I was able to 
establish particular values of G in a given simulation and thus can vary city racial 
composition (P) and the value of G independently across simulation trials. 

Each unique combination of values for P and G produces a unique distribution of 
Whites and Blacks across the neighborhoods of the city. Based on the resulting 
distributions, I calculated the scores of p and y for each neighborhood using proce- 
dures outlined earlier. I then performed graphical analyses to gain insight into how 
the y-p relationship varies across different combinations of values for P and G. 
I offer the following to summarize key findings from my explorations. 


e The relationship between y — relative rank position on p — and p is always 
nonlinear. 

e The value of y always increases as p increases but generally rises faster (has a 
steeper slope) at the beginning and at the end and rises slower (has a shallower 
slope) in between. 

e The nonlinear y-p relationship is variable, not fixed. Its exact form varies with 
the values of city racial composition (P) and the value of G. 

e City racial composition (P) determines whether the y-p relationship is symmetri- 
cal or asymmetrical. It is symmetrical when P is 50 and increasingly asymmetri- 
cal as P departs further from 50. 
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e The value of G determines whether the nonlinearity in the y-p relationship 
describe above is mild or pronounced. When G is high, the “steeper” portions of 
the y-p curve occur over short ranges on p and the “flatter” portion of the y-p 
curve occurs over an extended range of p. As the value of G declines, the “flatter” 


portion of the y-p curve becomes less distinct from the “steeper” portions of the 
curve. 


I conclude this discussion by describing how the principles just listed play out in 
selected example cases. I start with an example for a hypothetical “City A” where 
the racial composition of the city is balanced (i.e., P = 50) and the level of segrega- 
tion as measured by G is high (i.e., G = 90 ). As shown in the top panel of Appendix 
Fig. C.7, the y-p relationship is symmetrical (because P is 50) and strongly nonlin- 


City A 
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Fig. C.7 Examples of y-p relationship under varying combinations of G and P 
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ear (because G is high). Specifically, y rises rapidly over a short portion of the lower 
range for p (p=0-—15 ); y then rises slowly over an extended portion of the inter- 
mediate range of p (p= 15-85 ); and y then rises rapidly again over a short portion 
of the upper range of p (p=85-—100). More specifically, y increases about 40 
points over the range of 0-20 for p, then increases only 20 points over the range of 
20-80 for p, and then increases another 40 points over the range of 80-100 for p. 

The example labeled City B lowers G to 60 but leaves P unchanged at 50. The 
resulting y-p curve is shown in the lower left panel of the figure. The relationship 
remains symmetrical, as in City A, because P is 50. But the lower value for G pro- 
duces a less strong nonlinear relationship evident in the fact that the differences 
between the steeper and flatter portions of the curve now are smaller. The example 
labeled City C leaves G unchanged for City A, but increases P to 85, a value more 
typical for US urban areas. The y-p curve continues to have distinct steep and flat 
portions as in City A. But now the curve is asymmetrical with most of the rise in y 
taking place over the last portion of the range of p ( p = 90—100). 

The pattern seen in City C becomes even more dramatic when relative minority 
group size is at low levels (i.e., below 5) and P is high. This provides a basis for 
understanding a finding that is discussed in Chaps. 6, 7, and 8 of the main text. The 
finding is that scores for G and D can be and often are much higher than scores for 
S when the two groups in the comparison are imbalanced in size. As the pattern for 
City C shows, this possibility arises because the two groups can differ by relative 
small amounts on p — the area outcome that determines S — and at the same time can 
differ by large amounts on y as scored for G and D. The pattern for City A, and 
especially the pattern for City B, yield insight into why discrepancies between G 
and D in comparison with S tend to be much smaller when city racial composition 
is balanced. 


Appendix D: Establishing the Scaling Function y=f (p) 
Needed to Cast the Separation Index (S) as a Difference 
of Group Means on Scaled Pairwise Contact 


In this appendix I establish the scaling function y=f (p) that accomplishes the goal 
of scoring residential outcomes (y) from area group proportions (p) such that the 
scores for y fall over the range 0-1 and yield the value of the separation index (S) as 
a difference of means on y for the two groups in the segregation comparison. The 
end result is that, in the example of using S to assess White-Black segregation, 
S=Y, —Y, where Yw and Yg are the group means for Whites and Blacks, respec- 
tively, on individual residential outcomes (y) scored from the value of the area group 
proportion (p) for the areas in which the individuals reside. 

The value of p for an area reflects pairwise group contact or exposure. 
Accordingly, the value of y for an area can be described as reflecting scaled pairwise 
group contact or exposure and the expression ( Yy — Y, ) can described as the differ- 
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ence of group means on scaled pairwise group contact. The scaling function 
y=f (p) that places S in the desired difference of group means framework is devel- 
oped below. The scaling function is simple and substantively attractive. Specifically 
it is the exact one-to-one linear function f(p,) =p; which means that S can be 
placed in the difference of means framework without rescaling p from its original or 
“natural” metric of pairwise group contact. 

The separation index (S) has been known by many names including: the variance 
ratio index (V, James and Taeuber 1985), the correlation ratio (r, Stearns and Logan 
1986; White 1986), eta squared (n? , Duncan and Duncan 1955; James and Taeuber 
1985), the mean square deviation (MSD, White 1986; Zoloth 1976), ri; (Coleman 
et al. 1975), and S (Zoloth 1976; Becker et al. 1978). The index is well established 
in the literature on segregation measurement and has been widely used in empirical 
segregation studies for many decades. S is particularly attractive when cast in the 
difference of means framework used here because S can be expressed as a differ- 
ence of means on scaled pairwise group contact where group contact is based on 
area group proportion (p) in its “natural” metric — that is, without rescaling p as is 
required for the other indices considered here. 

As best I have been able to determine, Becker et al. (1978: 353) were the first to 
show that in the two group case S can be given as the simple difference between the 
focal group’s contact with itself (i.e., generically, Pxx, for White contact with 
Whites, Pww) and the comparison group’s contact with the focal group (i.e., generi- 
cally, Pyx, for Black contact with Whites, Pgw) based on 


S = Px —Pyx in generic form and 
S = Pyy -Pw for White-Black segregation. 


Note that this relationship holds only when the population consists of only two 
groups and it does not generalize to situations where the population consists of three 
or more groups. The relationship can be adapted to all circumstances by restating 
contact as “pairwise” contact instead of “overall” contact as follows 
S = Pyx xy —Pyxxy- 

Here the suffix “.XY” in the subscripts contact indicates that the contact calcula- 
tions are based only on the counts of the two groups in the segregation comparison. 
Thus, Pxx xy denotes the focal group’s pairwise contact with itself and Pyx xy 
denotes the comparison group’s pairwise contact with the focal or “reference” 
group. 

For White-Black segregation, conventional or “overall” contact indices as intro- 
duced by Bell (1954) are given by 


Pauw = 1/W-2w,p, = 1/W- Zw, (w, /t;) 


for White contact with Whites and 
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Pay = 1/B-2b,p, = 1/B-Zb, (w,/t,) 


for Black contact with Whites. The corresponding pairwise contact indices are 
given as follows. 


Pww.we = 1/W-2w;p, = 1/W-=w, (w,/(w, +b;)), and 


Pawwe = 1/B-£b,p, = 1/B-=b, (w; /(w, +b, )) 
The difference key difference between overall and pairwise contact is that 
t 4 (w, + b,) when the population includes groups other than Whites and Blacks. 

All popular indices of uneven distribution [are usually applied as “pairwise’’] 
measures. That is, their calculations draw only on counts for the two groups in the 
segregation comparison. So formulating contact indices in this way is not unusual. 
One simply must bear in mind that contact in this formulation is interpreted in terms 
of the pair of groups involved in the comparison. When the population also includes 
groups other than Whites and Blacks, the separation index is given by 


S =Pyw.we ~Paw.we 


where Pww.wsg is White’s average pairwise contact with Whites and Pgw. we is Black’s 
average pairwise contact with Whites. When the population consists only of Whites 
and Blacks, the same expression obviously continues to hold but the “ wg” subscript 
is not necessary. 

The distinction between overall and pairwise contact is important but it is cum- 
bersome. Since all indices of uneven distribution are based on pairwise compari- 
sons, I drop the “ xy” suffix notation from this point forward. Thus, for convenience, 
the expression 


S = Paw — Pew 


indicates a pairwise construction unless otherwise noted. Likewise, pairwise con- 
structions are assumed for city and area proportion White (P and pj, given respec- 
tively by P = Ww/(W+B) and p, = w, /[w, +b, | ) and city and area proportion 
Black (Q and q;, given respectively as Q = B/ W+B) and q; =b; /[w; +b; J). 
These conventions are in keeping with the literature on segregation measurement 
which lets context dictate when area proportion White (p;) should computed using 
“overall” calculations (i., p,=w,/t,) or “pairwise” calculations (i.e., 
p; = w, /[w; +b, ]). 

To conclude this discussion, the separation index (S) can be given as the group 
difference of means on average pairwise contact with the reference group. In the 
case of White-Black segregation, S = Pww—Pgw. The terms Pww and Pgw assess 
White and Black group averages on area proportion White (p;). Setting residential 
outcomes (y;) to the value of area proportion White (p;) allows one to place S in the 
notation of the difference of means framework restating it as S = Yy—Ygx. The next 
sections review terms from the “variance ratio” formulation of S and then demon- 
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strates that the differences of means formulation of S and the variance ratio formula- 
tion of S are equivalent. 


Variance Analysis 


I now consider the relationship S=n? in more detail. I acknowledge that the expres- 
sions and relationships I introduce below are not particularly original. They have 
been noted elsewhere including, for example, in papers by Becker et al. (1978): 
353) and White (1986:207) and also in statistical texts such as Blalock (1979: 81). 
The contribution of the discussion here is that it collects and calls attention to points 
not emphasized in most previous discussions. 

Duncan and Duncan (1955) noted that the separation index (S) (which they 
termed the variance ratio) is equivalent to the eta squared ( n? ) statistic from analy- 
sis of variance. More specifically, S is equal to n? for the analysis of how X, an 
individual-level binomial variable for race (coded 1 for Whites and 0 to Blacks), 
varies over areas. The value of S thus indicates the proportion of variation in race 
(X) that is “explained” by area of residence. Under even distribution S will be 0 
because the representation of Whites and Blacks in each area will exactly reflect 
each group’s representation in the city overall and knowledge of area will not 
improve the prediction of race above the baseline of assuming the overall city aver- 
age. Under complete segregation S will be 1 because area of residence will be 
homogeneous — either all White or all Black — and thus area will perfectly predict 
race. Intermediate success in prediction is quantified as the ratio BSS/TSS from 
analysis of variance where BSS is the “between group sum of squares” for indi- 
vidual deviations from the overall mean and TSS is “total sum of squares” for 
individual deviations from the overall mean. The overall mean for X is the propor- 
tion White in the city population (P) so TSS= X(X, -P)? with k used here to 
index individuals. Predictions for X are based on category means for X which in 
this case are equal to area proportion White (pi) so BSS= E(P - P)? with i here 
serving to index areas. Finally, for completeness, inability to explain X is quanti- 
fied by WSS/TSS where WSS is the “within group sum of squares” given by 
WSS =) (X; — Pi ie 

It is useful to note at this point that the value of n? also is equal to the square of 
the individual-level bivariate correlation of race (X) and area proportion White (p;). 
Thus, one can interpret S as indicating the degree to which race determines area 
proportion White (p) for individuals as quantified by r? from the regression of p; on 
X or of n? from the analysis of how p; varies by race. Either way, it is clear that the 
value of S revolves around the impact of race on contact with Whites at the indi- 
vidual level as reflected in the White-Black difference of means in contact with 
Whites (p;). Under even distribution explanation S will be 0 because all p, =P so 
the White and Black means for contact with Whites (p;) are the same and knowledge 
of race will not improve the prediction of contact with Whites (p) above the baseline 
of assuming the overall city average (P). Under complete segregation S will be 1 
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because race will perfectly predict contact with Whites with all Whites living in 
areas where p; =1 and all Blacks living in areas where p, =0. 

The more general relationship including intermediate outcomes is set forth in 
more detail below. Relevant relationships from analysis of variance can be summa- 
rized as follows. 

TSS=BSS+ WSS 


12=BSS/TSS 
12=1-WSS/TSS 
TSS=2} (X, -P)? 
WSS=2 2 (X, -p;)? 


BSS=}t, (p; -P)? 


with “i” serving as an index of areas and “k” serving as an index of individuals 
within areas. 

The following expressions are adapted from discussions in White (1986: 207) 
and Becker et al. (1978: 353) and indicate how TSS, WSS, and BSS also can be 
obtained from terms that found in standard computing formulas for S. 

TSS=TPQ 
BSS=2} t p;2-— TP? 
WSS=ž}t;p;q; 


BSS/TSS=1/TPQ(2 t;p? -TP°) 
The basis for the three expressions is established as follows. First, the equivalence 


of TSS and TPQ can be established as follows based on Whites and Blacks being 
scored 0 and 1 on race (X). 


TSS=2(X,,—P)? (a standard formula for TSS) 
= w(i = P)2 + B(0 = P) 2 (restate as separate operations for Whites 
and Blacks) 
= TP(1 — P)2+ TQ(O = P)? (replace W with TP and B with TQ) 
=TPQ?+TQ (0 - P) 2 (replace (1 -PY with Q?) 
= TPQ?+ TQP? (replace (0O—P)? with P?) 
=TPQ (Q) +TPQ (P) (reorganize terms) 
=TPQ (Q + P) (reorganize terms) 


TSS=TPQ (based on Q+ P=1) 
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Next, the equivalence of WSS and È tipiq; can be established as follows. 


WSS=E2(X,, —p,)? 


i 


= Zw, (1p; )?+2Zb; (0-p,)? 


=} tp; (1-p,)?+2t,q; (0-p,)? 

= Xt.p; (q,)?+2Z tq, (O-p, )? 

=Zt,p,q,;?7+2t,q,p,? 

= t,p.q; (4; )+2t,p,q; (P; ) 

= t,p,q, (q; +p; ) 
WSS=}t p.q; 


(standard formula for WSS) 


(restate as separate operations for Whites 
and Blacks) 


(replace w; with tip; and b; with tiq;) 
(replace 1—p, with q;) 

(replace (0- p;)? with p,?) 
(reorganize terms) 

(reorganize terms) 


(based on q; +p; =!) 


Then the equivalence of BSS and È t;p;?—TP? can be established as follows 


BSS=<t,(p,-P)? 
= t;(p,2-2p,P+ P?) 
=Lt,p,2-Lt,2p,P+zt,P? 
=Lt,p,2-2P-Lt,p,+P?-Lt, 
=Lt,p,2-2PZt,p, + TP? 
=Lt,p,2-2PTP + TP? 


=5t,p,2—2TP2+ TP? 
BSS=t,p,2—TP? 


(standard formula for BSS) 

(multiply out (p; -P)?2 ) 

(reorganize as multiple summations) 
(move constants outside of summations) 
(substitute T for È t,) 


(substitute TP for È tp; based 
on P=2t,p,/T) 


(reorganize terms) 


(combine terms) 


From these expressions, n? and S can be obtained from the following computing 


formulas 


S = n? = BSS/TSS 


S = n? = (Zt,p,2—P2)/TPQ. 
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S also can obtained from the simple difference between pairwise White ee with 
Whites (Pww) and pairwise Black contact with Whites (Pgw); thatis, S = — Paw 
Because y; for S is scored directly from p;, Yw=Pww and Yg = Pgw and Ta Tiloe 
equalities hold. 


Yw- Y, = BSS/TSS 
Pww -Pew = BSS/TSS 
I provide a derivation establishing these equivalences below. I initially developed 


the derivation independently. However, I later discovered that a similar derivation 
had been given in Becker et al. (1978: 353). 


S=Y,.-Y, (follows because y; = p;) 
=Pww — Paw 

= (= WP; ) / W —(Zb,p, )/B (standard expressions for Pww 
& Paw) 

= (Zt,p,p; )/ W- (Zt piq: ) / B (replace w; with tip; and b; with tiq;) 

=(2 tp:?) / TP = (=t,p,q; ) / TQ (replace W with TP and B with TQ) 

=(Q/Q)(Zt,p,7) /TP -(P/P)-=t,p,q,)/TQ (introduce 1 in the form of Q/Q 
and P/P) 

=(Q -Zt,p;?) / TPQ-(P-Zt,p,q; ) / TPQ (reorganize terms) 

=(Q -Xt.p,2-P-Xt.p, i; ) ee (reorganize terms) 

= [Q-Ztp;? -P-Èt. P; (1 p;)| )/ TPQ (reorganize terms) 


[Q-2t,p, aaa Zt,p;2)|/TPQ (restate P-Dt;p, (1—p,) 
as P-Xt,;p,-P-2t;p,? ) 


=(Q- -Dt p;2+P- Èt p;? -P-Zt,p,) / TPQ (reorganize terms) 

=| ( (P+Q)- )-Et, .P,?-P-Xt,p, // TPQ (reorganize terms) 
=(Zt,p,2-P-Zt,p,) / TPQ ((P + Q = 1 and drops out) 
=(Et, p,?-P: TP) / TPQ (substitute TP for È tip;) 

= (Ztp;? -TP?) / TPQ (reorganize terms) 

S= BSS / TSS (substitute BSS for È t;p;?— TP? 


and TSS for TPQ as established 
earlier) 
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As a last comment, I note that the discussion here shows that S simultaneously 
registers two separate and distinct aspects of the relationship between race and con- 
tact with Whites (p). 


e Under the traditional eta squared or variance ratio interpretation, S indicates the 
strength of the association between race (i.e., group membership) and contact 
with Whites (p). 

e Under the new interpretation of S as a difference of group means for contact with 
Whites, S indicates the “impact” or “effect” of race (i.e., group membership) on 
contact with Whites. 


Thus, S equals both the regression coefficient (b) for race and the square of the cor- 
relation coefficient (r) from the bivariate regression analysis predicting contact with 
Whites (p;) based on race (X). Interestingly, both options allow for applying signifi- 
cance tests for the value of S. 


Appendix E: Establishing the Scaling Function y=f (p) Needed 
to Cast the Theil Entropy Index (H) as a Difference of Group 
Means on Scaled Pairwise Contact 


In this appendix I establish the scaling function y=f (p) that accomplishes the goal 
of scoring residential outcomes (y) from area group proportions (p) such that the 
scores for y fall over the range 0-1 and yield the value of the Theil entropy index 
(H) as a difference of means on y for the two groups in the segregation comparison. 
The end result is that, in the example of using H to assess White-Black segregation, 
H= Yy — Y, where Yw and Yg are the group means for Whites and Blacks, respec- 
tively, on individual residential outcomes (y) scored from the value of the area group 
proportion (p) for the areas in which the individuals reside. 

The scaling function y=f (p) that places H in the difference of group means 
framework is developed below. Discussion of this function in the main body of this 
monograph notes that y is a smooth continuous, nonlinear transformation of p that 
changes p from its original or “natural” metric to a new metric that exaggerates 
group differences on p over portions of the lower and upper ranges of p (i.e., roughly 
p<0.25 and p> 0.75 ) and compresses group differences on p over middle por- 
tions of the range of p (i.e., roughly 0.30 < p < 0.70 ). 

Please note that the primary credit for discovering the scaling function for H 
should be given to Warner Henson, II. Warner derived the first version of the scal- 
ing function for H while working with me as an undergraduate research fellow 
completing his BS in sociology at Texas A&M University.'® I have subsequently 
added refinements and extensions to his work to serve the needs of this monograph, 


10 That was in the 2007. Soon after, Mr. Henson graduated and enrolled in the Sociology doctoral 
program at Stanford University. 
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but these are minor changes. Mr. Henson established the essential features of the 
derivation. 

Continuing with the familiar example of White-Black segregation, a basis for 
scoring residential outcomes (y) such that the scores of y fall over the same range as 
p (i.e., 0-1) and yield the Theil index (H) as the difference of means ( Yy — Y, ) can 
be established as follows. First, start with the desired equivalence 


H = Y,-Y, = (1/T)-=t, (E-e; )/E. 
The expression on the far right side is an adaptation of the formula for H given in 
James and Taeuber (1985). Next replace the terms Yw and Yg with alternative com- 
puting expressions as follows 
(1/W)-=wzy, -(1/B)-Xb,y; = (1/T)- £t, (E-e; )/E. 
Then replace W and B with alternative expressions based on T, P, and Q. Specifically, 
replace W with PT and replace B with QT. Similarly, replace w; and b; with alterna- 
tive expressions based on t;, p;, and qi. Specifically, replace w; with p;t; and b; with 
qiti. This yields 
(1/PT)-=p,t.y, —(1/QT)-Xq,t,y, =(1/T)-2t, (E-e; )/E. 
Then rearrange terms as follows 
(1/T)-2(p,/P)t,y,-(1/T)-2(q,/Q)ty, = (1/T)-£t, (E-e;)/E 
Z(pi/P)ty: -2 (q; /Q)ty; = Xt, (E-e, )/E 
Xty; | (p:/P)-(a:/Q)] = Xt, (E-e; )/E 
Zty,= Ut, [(E —e,)/E]/(p,/P—a,/Q)- 
From the above expression, it is evident that 


J= [ (E-e,)/E|/(p,/P-4;/Q) s 


For actual calculations, E and e; would be expanded to their full expressions using 
the following substitutions 


E = P-In(1/P)+Q-In(1/Q), and 


e, = p; -In(1/p, )+q; -In(1/q; ). 
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Adjusting the Range to 0-1 


At this point a small additional adjustment is needed. The scores for y; will yield H 
as a difference of group means thus achieving one important goal of the exercise. 
However, the scores for y will not fall in the range 0-1. They instead will fall in the 
range —Q to P as p; varies from its minimum value of 0 to its maximum value of 1. 
This is because, when p; is either 0 or 1, e; evaluates to O and the term (E-e, ) / E 
evaluates to 1. This reduces the expression 


yi [(E-e,)/E]/(p,/P-4;/Q) $ 


to 


y: =1/[(p:/P)-(a:/Q)] - 


When p; is 0, this expression becomes 


y; =1/| (0/P)-(1/Q) | 


which evaluates to -Q. Similarly, when p; is 1, the resulting expression is 


y=l/[(1/P)-(0/Q)] 


which evaluates to P. 
The range for y can therefore be set to 0-1 by incorporating the constant Q in the 
function as follows 


y;= Q+[(E-e,)/E]/(p,/P-a,/Q). 


This achieves the desired solution. 


A Loose End When p=P 


There is a final issue to deal with. Interestingly, the value of y; is undefined when p; 
is exactly equal to P. This is because the term (p,/P —q;/Q) will then be 0 and the 
same also will be true of the term [(E -= e;) /E]. Thus the expression 
[(E -e; )/E]/(p; /P-q;/Q) will be undefined because it involves division by zero. 
As a practical matter, exact equality of p; and P is very rare in conventional empiri- 
cal analyses of residential segregation in urban areas. Nevertheless, it is a logical 
possibility that it can occur in empirical studies of segregation and it is certainly 
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likely to occur in methodological analyses and simulation studies. So it is necessary 
to establish a procedure for handling this situation. 

The procedure I adopt is the following: when p; is exactly P, assign a value for y 
based on the limiting values of y obtained by taking values of p; that are arbitrarily 
close to P, but are just short of reaching exactly P. For example, the value of y can 
be established in this way by averaging the two values of y obtained 
using p; =P—0.0000001 and p; = P+0.0000001. The two values of y will be 
exceedingly close; so close in fact that a graph of the y-p relationship will appear as 
a smooth, continuous function in which y rises monotonically as p ranges from 0 to 
1 with only an arbitrarily small “break” in the line at the exact point where p; =P. 
The procedure suggested here would simply fill in this one point on the line. I offer 
this as a reasonable, practical strategy to follow until a better alternative is 
identified. 


Appendix F: Establishing the Scaling Function y=f(p) Needed 
to Cast the Hutchens’ Square Root Index (R) as a Difference 
of Group Means on Scaled Pairwise Contact 


In this appendix I establish the scaling function y=f (p) that accomplishes the goal 
of scoring residential outcomes (y) from area group proportions (p) such that the 
scores for y fall over the range 0-1 and yield the value of the Hutchens Square Root 
Index (R) as a difference of means on y for the two groups in the segregation com- 
parison. The result is that, in the example of using R to assess White-Black segrega- 
tion, R= Y, —Y, where Yw and Yg are the group means for Whites and Blacks, 
respectively, on individual residential outcomes (y) scored from the value of the 
area group proportion (p) for the areas in which the individuals reside. 

The scaling function y=f (p) that places R in the differences of group means 
framework is developed below. Discussion of this function in the main body of this 
monograph notes that y is a nonlinear transformation of p that changes p from its 
original or “natural” metric to a new metric that exaggerates group differences on 
p over portions of the lower and upper ranges of p (i.e., roughly p<0.25 and 
p > 0.75 ) and compresses group differences on p over middle portions of the range 
of p (i.e., roughly 0.30 < p < 0.70 ). The scaling function for R is very similar in 
shape and behavior to the scaling function for the Theil Entropy index (H). The 
main difference is that the nonlinearity in the scaling function for R is more pro- 
nounced; that is, it departs from linearity in the same basic manner as the scaling 
function for H, but the magnitude (amplitude) of the departure from linearity is 
consistently larger. 

To establish the function y=f (p) , Start with the desired equivalence 


Y,,-Y,= R. 
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Next replace R with an expression adapted from the formula for R given in Hutchens 
(2001, 2004). 


Yy—Y, = 1-2 ,|(w,/W)(8,/B). 
Next replace Yw and Yg with the terms of their computing formulas as follows 
1/W-2w,y; -1/B-Zb,y; = 1-Z,/(w,/W)(b,/B). 


Then replace W and B with expressions based on T, P, and Q. Similarly, replace w; 
and b; with expressions based on ti, pi, and q; to obtain 


1/PT- pity, -1/QT-Zq;t;y; = 1-2 (p.t /PT)(qt,/QT). 


Then rearrange terms as follows. First, on the right side isolate (t;? / T? ) inside the 
radical 


1/PT-=p,ty, -1/QT-Zq;t;y; = 1-2 (2/7?) (», /P)(q; /Q). 


Then move (t,?/ T? ) outside of the radical as (t; /T) and then restate it as t; (1/T) to 
obtain 


1/PT-Zp,t,y, -1/QT-Zq,t,y,= 1-2(t, /T)|(p,/P)(4,/2) 
Restate [(p,/P)(9,/0) as [p,q,/PO to obtain 
1/PT-Zp,t,y, -1/QT-Zq,t,y, = 1-2 t; (1/T)fp,4,/PO. 
On the left side move P and Q inside the summations 
1/T-2(p,/P)t,y, -1/1-Z(q,/Q)t,y, = 1-2(t,/T)-/p,4,/PO 


On the right side replace 1 with the equivalent expression È t; (1/T) and replace 
(ti /T) with t; (1/T) 


1/T-2(p,/P)ty, -1/T-2(q;/Q)t;y; 7 xt, (1/T)-=t, (1/T)-Jp,4,/PQ. 


Next reorganize on both sides 


1/T-[ 2 (p,/P)t,y; -Z(4,/Q)ty; | = Xt, (1/T-1/T-Jp,4,/PO). 
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Next multiply both sides by T as follows 
2(p,/P)t,y, -2(q,/Q)ty, = Ti Zt (1/T -1/T \p.4,/PO) |. 
Then move T inside the summation on the right side to obtain 
2(p,/P)t,y, -2(4,/Q)t,y, = Zt, (I-/p,4,/PO J. 
Next reorganize terms on the left side. 
zty,[(p,/P)—(4,/Q)] = 2t,(1-/p.4,/PQ). 
Then divide both sides by [ (p; /P)—(q;/Q) ] 
Ety,= Zt,(1-J7,4,/PO )/(p,/P-a,/Q). 
From the last expression, it is clear that 


yi=(1-Jp,4,/PO ) ((p,/P-a,/Q). 


Adjusting the Range to 0-1 


An additional adjustment is required. Under the last expression, the scores for y will 
yield R as a difference of group means. However, the scores for y will not fall in the 
range 0-1 as desired. Instead, values of y; will range from —Q to P as p; varies from 
its minimum value of 0 to its maximum value of 1. That is, the expression 


y = (1-/7,4,/P2) /(p:/P-a:/Q) 


yields — Q when p; is 0 and P when p; is 1. Accordingly, the range for y can be set to 
0-1 by incorporating the constant Q in the function as follows 


y= Q+(1-/p,4,/PO) /(p:/P-a;/Q) 


A Loose End When p = P 


One final matter requires attention. It is that y; is undefined when p; is exactly equal 
to P because the term [(p,/P) — (q;/Q)] will then be 0. Thus the expression 
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(1-/p.4,/PO) /(p,/P-a,/Q) 


will be undefined because it will involve division by zero. As a practical matter, 
exact equality of p; and P is very rare in conventional empirical analyses of residen- 
tial segregation in urban areas. Nevertheless, it is a logical possibility in empirical 
studies and it is especially likely to occur in methodological analyses and simula- 
tion studies. So it is necessary to establish a procedure for handling this situation. 

The option I adopt is as follows: when p; is exactly P, assign a value for y based 
on the limiting values of y obtained by taking values of p; that are arbitrarily 
close to P, but are not exactly P. For example, the value of y can be established 
in this way by averaging the two values of y obtained using p; = P—0.0000001 
and p; =P+0.0000001 . The two values of y will be exceedingly close; so close in 
fact that a graph of the y-p relationship will be a smooth, continuous function in 
which y rises monotonically as p ranges from 0 to | with only an arbitrarily small 
“break” in the line at the exact point where p; =P . The procedure suggested here 
simply fills in this one point on the line. I offer this as a reasonable, practical strat- 
egy to follow until a better solution is identified. When this approach is adopted, an 
interesting regularity is observed; the value of y always converges on 0.50 when p; 
is set arbitrarily close to P. 


An Observation 


There is another interesting regularity in the y—p relationship. It is that y is always 


equal to Q when p,=Q. The basis for this regularity is that the expression 
7 P,4,/PQ takes the value of 1 when p, =Q. Accordingly, the expression 


(1-/7,4,/P2) /(p,/P-a,/Q) 


takes the value of 0, yielding the result of y=Q. The one exception is when Q is 
0.5. In that situation, P also is 0.5 and y is undefined as just described above. 
However, the above procedure of substituting 0.5 for y when p,=P also produces 
a result consistent with the regularity that y=Q when p; =Q. 
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